
Calhoun 

iniQiuiic^iul Ar{hiv« of tilt Mil vdl Poii^roduiit School 


Calhoun: The NPS Institutional Archive 
□Space Repository 



Theses and Dissertations 


1. Thesis and Dissertation Collection, all items 


2016-03 

Crowdsourcing physical network topology 
mapping with net.Tagger 


Woodman, Daniel Glenn 

Monterey, California: Naval Postgraduate School 


http://hdl.handle.net/10945/48495 


This publication is a work of the U.S. Government as defined in Title 17, United 
States Code, Section 101. Copyright protection is not available for this work in the 
United States. 

Downloaded from NPS Archive: Calhoun 



DUDLEY 

KNOX 

LIBRARY 


htt p://w ww. n ps. e du/l ib ra ry 


Callwuo is the Naval Postgraduate School's public access digital repository for 
research mate rials and institutiional publicatkios created by the NPS community. 
Calhoun is named for Professor of Mathematics Guy K. Caftiouo, NPS's first 
appointed — and published — schoteily author. 

Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 Univefsity Circle 
Monterey, California USA 93943 







NAVAL 

POSTGRADUATE 

SCHOOL 


MONTEREY, CALIEORNIA 

THESIS 


CROWDSOURCING PHYSICAL NETWORK TOPOLOGY 
MAPPING WITH NET.TAGGER 

by 

Daniel Glenn Woodman 
Mareh 2016 

Thesis Advisor: Robert Beverly 

Second Reader: Justin R Rohrer 


Approved for public release; distribution is unlimited 




THIS PAGE INTENTIONALLY LEET BLANK 



REPORT DOCUMENTATION PAGE 

Form Approved 0MB No. 0704-0188 

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, 
searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments 
regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington 
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202—4302, and 
to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 

1. AGENCY USE ONLY (Leave Blank) 

2. REPORT DATE 

03-25-2016 

3. REPORT TYPE AND DATES COVERED 

Master’s Thesis 09-29-2014 to 03-25-2016 

4. TITLE AND SUBTITLE 

CROWDSOURCING PHYSICAL 
NET.TAGGER 

NETWORK TOPOLOGY MAPPING WITH 

5. FUNDING NUMBERS 

N66001-2250-58231 

6. AUTHOR(S) 

Daniel Glenn Woodman 



7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

Naval Postgraduate School 

Monterey, CA 93943 

8. PERFORMING ORGANIZATION REPORT 
NUMBER 

9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 

Department of Homeland Security 

245 Murray Lane SW, Washington, DC 20528 

10. SPONSORING / MONITORING 

AGENCY REPORT NUMBER 

11. SUPPLEMENTARY NOTES 







The views expressed in this document are those of the author and do not reflect the official policy or position of the Department of 
Defense or the U.S. Government. IRB Protocol Number: N/A. 

12a. DISTRIBUTION / AVAILABILITY STATEMENT 

Approved for public release; distribution is unlimited 

12b. DISTRIBUTION CODE 

13. ABSTRACT (maximum 200 words) 






Despite significant research, the challenge of mapping the physical topology of large networks remains a relatively unsolved 
problem. Although it possesses numerous ramifications for Internet security and resiliency, physical network geolocation research has 
not matched corresponding advancements made in logical topology mapping. This thesis proposes net.Tagger: a novel approach to 
network infrastructure mapping that combines smartphone apps with crowdsourced collection to gather data for offline aggregation 
and analysis. The project aims to build a map of physical network infrastructure such as fiber-optic cables, facilities, and access 
points. The net.Tagger project aligns to the OpenStreetMap project, a proven, open-source framework for managing crowdsourced 
map data. This thesis delivers a working proof-of-concept system for further research, including a smartphone app for gathering 
physical topology data, and the backend services to process and store it. We also present the results of an initial release to 25 users, 
analysing collection trends and extrapolating to predict potential findings of a future large-scale release. 

14. SUBJECT TERMS 

Internet, network mapping, physical network topology, Internet backbone, crowdsourcing 

15. NUMBER OF 
PAGES 111 







16. PRICE CODE 

17. SECURITY 

CLASSIFICATION OF 

REPORT 

Unclassified 

18. SECURITY 

CLASSIFICATION OF THIS 

PAGE 

Unclassified 

19. SECURITY 

CLASSIFICATION OF 

ABSTRACT 

Unclassified 

20. LIMITATION OF 
ABSTRACT 

uu 


NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 


Prescribed by ANSI Std. 239-18 


1 




























THIS PAGE INTENTIONALLY LEET BLANK 


11 



Approved for public release; distribution is unlimited 


CROWDSOURCING PHYSICAL NETWORK TOPOLOGY MAPPING WITH 

NET.TAGGER 


Daniel Glenn Woodman 

Lieutenant Junior Grade, United States Coast Guard 
B.S., U.S. Coast Guard Academy, 2012 


Submitted in partial fulfillment of the 
requirements for the degree of 

MASTER OF SCIENCE IN COMPUTER SCIENCE 

from the 

NAVAL POSTGRADUATE SCHOOL 
March 2016 


Author: Daniel Glenn Woodman 

Approved by: Robert Beverly 

Thesis Advisor 

Justin P. Rohrer 
Second Reader 

Peter Denning 

Chair, Department of Computer Science 


iii 



THIS PAGE INTENTIONALLY LEET BLANK 


IV 



ABSTRACT 


Despite signifieant researeh, the ehallenge of mapping the physieal topology of large 
networks remains a relatively unsolved problem. Although it possesses numerous ramifi- 
eations for Internet seeurity and resilieney, physieal network geoloeation researeh has not 
matehed eorresponding advaneements made in logieal topology mapping. This thesis pro¬ 
poses net.Tagger: a novel approaeh to network infrastrueture mapping that eombines smart¬ 
phone apps with erowdsoureed eolleetion to gather data for offline aggregation and analy¬ 
sis. The projeet aims to build a map of physieal network infrastrueture sueh as fiber-optie 
eables, faeilities, and aeeess points. The net.Tagger projeet aligns to the OpenStreetMap 
projeet, a proven, open-souree framework for managing erowdsoureed map data. This the¬ 
sis delivers a working proof-of-eoneept system for further researeh, ineluding a smartphone 
app for gathering physieal topology data, and the baekend serviees to proeess and store it. 
We also present the results of an initial release to 25 users, analysing eolleetion trends and 
extrapolating to prediet potential findings of a future large-seale release. 
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CHAPTER 1: 

Introduction 


Physical network topology mapping represents a counterpart to mapping large-scale net¬ 
works at more abstract levels. Many research groups have expended substantial efforts to 
map networks on the Internet Protocol (IP) level or higher. These efforts have resulted in 
a rich collection of data and tools useful for understanding the Internet’s virtual structure. 
However, the underlying physical infrastructure of cables and the equipment they connect 
such as routers, data centers and Internet Service Provider (ISP) Point of Presences (POPs) 
is not as well understood on a fine-grained level. 


1.1 Problem 

It may appear contradictory that current research is better adapted to model the fluctuating 
virtual interconnections of the Internet instead of the static hardware that carries its traffic, 
but this is precisely the case for several reasons. Primarily, difficulties in mapping arise 
because the physical topology of a network need not match its virtual configuration. For 
economical, performance, and security reasons, trying to configure a network to match its 
physical makeup would be ill-advised even if possible. Traditional network mapping tools 
thus cannot be used for physical analysis without introducing substantial sources of error. 

An additional hindrance to the availability of static hardware information involves the com¬ 
plex relationships between ISPs, public utility managers, and government regulators that 
leaves researchers without a centralized source of information. Large swaths of the physical 
Internet are installed, managed, and regulated by different parties that have little business 
incentive to communicate beyond their sphere of influence. Much of the information that 
would be beneficial to researchers is considered proprietary and not released by its cor¬ 
porate owners. Certain vendors compile and offer limited maps using ISP data, but this 
information is usually sold instead of made publicly available. Also, data pinpointing static 
hardware locations is based on what its owner claims is correct, usually leaving it phys¬ 
ically unverified [1]. A final obstacle to advances in physical network mapping centers 
around the current publicly available data repositories that focus almost entirely on core 
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Internet backbone infrastructure [2]. A quality, publicly available, and consolidated source 
of low-level infrastructure does not exist at this time. 

The prevalence of these challenges to network topology mapping has resulted in the rate of 
large-scale network expansion largely outpacing the ability of researchers to keep it phys¬ 
ically catalogued. A strong argument can be made for the timely amelioration of these 
challenges, because understanding the composition and connections of the Internet not 
only provides valuable theoretical data to computer scientists, but is vital for the develop¬ 
ment of resiliency. Internet services play a central role in the commercial, government, 
and military sectors, and failures in reliability or performance have potentially serious con¬ 
sequences. Although Internet resiliency impacts both the national economy and security, 
it is not achievable without knowledge of the basic structure of the networks themselves. 
Because Internet traffic is usually consolidated in transit through several key points on the 
Internet’s “backbone,” a failure at any of these points could prove catastrophic. Compre¬ 
hending the structure of the Internet gives both the government and industry the ability to 
diagnose weak points and build in redundancy where needed. 

Critics of attempts to publicly map key network infrastructure contend these efforts serve 
as intelligence that attackers can use to plot operations. Their solution has been to either 
discourage extensive mapping or secure the results from public release. While a “security 
through obscurity” approach aligns with conventional military thought, the larger civilian 
security community sees this as a flawed approach. Their counterpoint normally states 
that true security lies in finding and fixing flaws instead of hiding them in hopes that they 
will not be discovered. The magnitude of this research problem is so great that multiple 
approaches from different research teams building on and collaborating with each other is 
necessary to yield results. Such efforts cannot exist without open exchange and publication 
of results. 

Critics also need to be reminded that threats to the infrastructure can come not only from 
intentional sources such as terrorist attacks, but also from accidents or natural disasters. Un¬ 
dersea Fiber Optic Cables (FOCs) are frequently severed by boat anchors or other sources. 
In 2008, three cables were severed within days of each other in the Mediterranean and Mid¬ 
dle East, reducing traffic capacity in some areas by up to 70% [3]. Natural disasters such 
as hurricanes or earthquakes can cause similar damage on land. Because these vulnerabili- 
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ties are not predicated on human knowledge, not mapping or securing knowledge of them 
provides no benefits. They present risks similar to intentional attack or sabotage, with the 
best means of remediation being awareness of network structure so one may analyse for 
vulnerabilities in order to correct them. 


1.2 Research Question 

This thesis seeks to investigate several questions: 


• What type and quantity of data must an app transmit to produce a useful data point? 
Given the constraints of an app transmission given available sensor data, privacy 
concerns, and bandwidth constraints, how can net.Tagger optimize a user’s submis¬ 
sion to gain enough information to reliably determine what exists at and what can be 
extrapolated from a given geographical position? 

• What is the optimal User Interface (UI) to reduce erroneous submissions and pro¬ 
vide user feedback? Within the realm of the user’s experience, any interactions must 
produce accurate data and prevent the user from continuing to submit data out of 
boredom or frustration. Since the average user will not be able to identify telecom¬ 
munications infrastructure indicators without training, the app must provide basic 
instructions on what to look for. Furthermore, crowdsourcing relies on the enthu¬ 
siasm of its users to continue submitting based on whatever incentive they receive 
from participating. net.Tagger does not pay its participants, however initiatives such 
as the OpenStreetMaps (OSM) foundation have received open source mapping sub¬ 
missions from hundreds of thousands of unpaid volunteers without offering com¬ 
pensation. net.Tagger must be able to provide appropriate nontangible incentives or 
feedback to encourage participation and repeated submissions. 

• How feasible is extrapolation from submissions to mapping inferences? net.Tagger 
works by identifying nodes based on user observations, but creating a map requires 
some means of correctly connecting the nodes. Based on initial data collection, how 
difficult is it to accurately generate map inferences? 
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1.3 Contribution 

In addition to investigating the aforementioned research questions, the main contribution of 
this thesis project is creation of a working app/backend. Analysis of topics such as usability, 
data requirements, and findings analysis are explored, however this project serves primarily 
as the inception of net.Tagger, with the intent that future student researchers will further 
develop the initiative into a mature entity providing a previously unattempted approach to 
a major outstanding research area. 

1.4 Thesis Organization 

Chapter 2 provides an overview of existing physical mapping techniques, the crowdsourced 
mapping community, and telecommunications infrastructure types relevant to this project. 

Chapter 3 lays out the framework of net.Tagger’s different components as well as design 
choices and the actual project development. 

Chapter 4 describes the testing methods used to evaluate net.Tagger and results from initial 
field testing and deployment. 

Chapter 5 evaluates the conclusions of the project. Answers to the research questions are 
explored, looking at preliminary conclusions about applying crowdsourced mapping to net¬ 
work topologies. Given this project’s scope as the foundation of a larger, ongoing initiative, 
future projects are described as well as a vision for an eventual large scale deployment of 
net.Tagger. 
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CHAPTER 2: 
Background 


2.1 Introduction 

This chapter provides a brief survey of physical network topology mapping topics as they 
apply to this thesis. The structure of the Internet at a physical level is briefly described, with 
an emphasis on long-haul FOC conduits and the “Internet backbone.” A number of policy- 
based decisions made within recent years are also explored as driving forces shaping the 
expansion of large-scale networks. These include Dig-Once laws, federal broadband ex¬ 
pansion initiatives, and Right-Of-Way (ROW) lawsuits. Justification is given for the neces¬ 
sity of historical approaches to physical topology, including measurement-based strategies 
such as Constraint Based Geolocation (CBG) and DNS-Based Router Positioning (DRoP) 
as well as compilation-based approaches such as the Internet Topology Zoo. 

2.2 Physical Internet Design 

2.2.1 Organization 

From a high-level perspective, the Internet can be studied and modeled at several levels [1]. 
The highest level is modelled in terms of organizations, which we define as entities under 
self control that are not subservient to other organizations. Based on structure and policy, 
each organization manages one or more IP prefixes known as Autonomous Systems (ASs). 
An AS is defined by RFC 1930 [4] as 

a connected group of one or more IP prefixes run by one or more network 
operators which has a SINGLE and CLEARLY DEEINED routing policy. 

Because organizations often wish to divide their network assets into subsections to ac¬ 
commodate complex structures and routing policies, a complex organization will own and 
operate several ASs. Organizations do not just include ISPs, but can also be government 
and educational institutions, corporate enterprises, and content providers. At the AS level. 
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these provider-level networks peer with eaeh other at Internet Exehange Points (IXPs) and 
private points based on poliey agreements [5]. The AS level is responsible for mueh of the 
truth behind the eommon networking idiom that “traffie does not follow the shortest path 
between two points, but the eheapest.” At the POP level, an ISP aggregates routers and 
modems in a physieal loeation (the POP itself) that provide a means for a loeal network of 
consumers to connect to the larger Internet backbone. The IP level consists of individually 
addressable end-hosts, aggregated subnets, and the router-level connectivity that joins the 
two. The IP level perspective of large-scale networks is frequently referred to as the “log¬ 
ical” layer, i.e., the organization and interconnections of individual network hosts depends 
upon the network’s logical configuration instead of their physical location. 

Finally, the physical layer consists largely of cables (fiber-optic or copper) and link-layer 
switching infrastructure. The physical layer can take other forms as well through mediums 
such as satellite Internet, however the core global Internet infrastructure utilizes FOC. 

2.2.2 Long-Haul Geography 

Because the logical topology of a network can be configured independently of its physical 
make-up, providers usually employ cost-saving measures to consolidate and share infras¬ 
tructure. The “Internet backbone” is mostly comprised of FOC long-haul conduits, a term 
that is not precisely defined but can be generally described. One project [6] defined a long- 
haul conduit within the scope of their research as one either spanning at least 30 miles, 
connecting population centers of at least 100,000 people, or housing the cables of at least 
two providers. They define them more informally as “a ‘tube’ or trench specifically built to 
house the fiber of potentially multiple providers.” 

Fong-haul conduits are frequently (but not unconditionally) placed adjacent to existing 
transportation infrastructure such as highways and railways. While expanding to meet 
growing consumer demand, long-haul networks can experience legal and logistical diffi¬ 
culties similar to other large-scale distribution networks such as railroads, power transmis¬ 
sion lines, and petroleum pipelines. The mechanism that traditional utility networks utilize 
in many situations is the ROW, an easement between a landowner and a service provider 
seeking usage rights but not ownership of a section of private property. ROWs are char¬ 
acterized by binding legal contracts between the property holder and service provider that 
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can be overseen by state commerce departments in order to ensure due process and eq¬ 
uity, even in cases involving consensual agreements instead of eminent domain. However, 
lawsuits by property owners against ISPs show cases where ROWs were not observed in 
cases of long-haul FOCs laid alongside infrastructure such as rail lines. In 2013, Sprint 
Communications Co and WilTel Communications were ordered to pay $770,000 to 1,888 
Connecticut property owners after the telecommunications providers negotiated with rail¬ 
road companies to lay FOC along existing ROWs instead of negotiating with the property 
owners for a new easement [7]. Because the ROWs contracts only granted permission for 
the railroads to lay and operate tracks, the railroads were not authorized to grant Sprint 
and WilTel permission to lay cables. Similar suits have been filed around the country, with 
the Connecticut case the 35th statewide deal receiving final approval. Although Sprint has 
been utilizing this practice since the 1980s [8], the legal precedent now set by these cases 
could complicate placing FOC alongside transportation infrastructure in the future because 
telecommunications providers will have to obtain separate easements from landholders. 


2.2.3 Traffic Consolidation 

Studies of long-haul conduits frequently determine that conduit sharing between ISPs is a 
default practice. One study [6] “observed that 89.67%, 63.28%, and 53.50% of the conduits 
are shared by at least two, three, and four major ISPs, respectively.” The same study found 
even more extreme examples, such as the conduit between Portland, OR and Seattle, WA 
that housed traffic from 31 separate ISPs. Traffic switching nodes also represent a point of 
traffic consolidation. 

Traffic consolidation also takes place on the individual conduit level via several mech¬ 
anisms. A single FOC cable contains many individual fibers, each capable of carrying 
traffic independent of the others. Due to the high cost of installing new cables, providers 
can simultaneously place more traffic on a single fiber through Wavelength-Division Mul¬ 
tiplexing (WDM). WDM is analogous to Frequency-Division Multiplexing (FDM) due 
to the inverse proportionality of wavelength and frequency in electromagnetic radiation, 
however by convention WDM is normally used in reference to infrared frequency signals 
in optical media such as FOC, while FDM is used for radio frequency signals. By mod¬ 
ulating separate data channels onto different carrier wavelength signals for transmission. 
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WDM permits an FOC operator to send multiple messages simultaneously over the same 
fiber. Upon reaching their destination, the signals are separated via bandpass filtering and 
their messages extracted. Dense Wavelength-Division Multiplexing (DWDM), a subset of 
WDM, theoretically permits placing up to 100 lOGB/s channels over optical media [9]. 
With each channel able to carry traffic from different senders running different network¬ 
ing protocols, WDM can consolidate substantial portions of traffic into the same physical 
conduits. 

Another mechanism to move more traffic through the same physical location is “dark fiber.” 
Because the high cost of installing FOC primarily lies in excavation, companies will fre¬ 
quently install more than necessary in an given conduit with the knowledge that a certain 
percentage of fibers will go unused for a time. Business transactions such as mergers 
and acquisitions among telecommunications companies can also leave providers with extra 
FOC running through the same conduit as live cables. These are commonly referred to 
as dark fiber, and can be leased to customers who desire a greater degree of control over 
their networks. Where WDM technologies can offer increase capabilities as a service, dark 
fiber operates as a physical asset. Leasing dark fiber gives a customer permission to oper¬ 
ate these unused fibers as their own, with a wide degree of freedom in customizing their 
configuration. 

2.2.4 Federal Initiatives 

To encourage expansion and competition between broadband providers. President Obama 
signed Executive Order 13616 [10]: “Accelerating Broadband Infrastructure Deployment.” 
The executive order provides funding and direction for government agencies to coordinate 
in order to streamline regulatory processes and reduce barriers experienced by broadband 
providers seeking to expand. The Executive Order covered a variety of areas, most notably 
initiatives known as “Dig-Once” practices [11]. When new broadband infrastructure (usu¬ 
ally EOC) is laid underground in urban areas, up to 90% of installation costs are associated 
with the actual road excavation. This can create prohibitive expenses for ISPs seeking to 
expand into new areas, and also prevent new ISPs from entering markets in areas already 
covered by a single provider, depriving consumers of beneficial competition. 

Dig-Once initiatives preemptively lay EOC conduits at the same time that new transporta- 
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tion infrastructure such as roads are put in. This permits ISPs to expand by running cables 
through existing conduits, avoiding the high expense of excavating from scratch. Proposals 
such as HRS805: The Broadband Conduit Deployment Act of 2015 [12] would mandate 
FOC conduits on federally-funded highway construction projects if the area in question 
is predicted to require broadband infrastructure within the next 15 years [13]. Although 
HR3805 has not been passed at this time, efforts initiated by EO 13616 are actively de¬ 
veloping Dig-Once practices through other channels. As Dig-Once laws are more widely 
adopted, a side-effect will be further consolidation of traffic from multiple providers into 
the same channels. 

In addition to Dig-Once practices, the Broadband Opportunity Council (BOC) established 
by EO 13616 made other recommendations that will shape the future growth of long-haul 
networks. The BOC’s official report [14] pursuant to EO 13616 laid out several objectives, 
including: 

• Make Eederal lands and assets available for conduits. 

• Standardize permitting and regulation, shifting it to the federal level to reduce bur¬ 
dens on local government and provide uniformity across state, local, and tribal 
boundaries. 

• Emphasize broadband as an eligible and desirable funding target for community and 
regional infrastructure development projects. 

• Collaborate with the private sector to reduce barriers to market entry and incumbent 
expansion for broadband providers. 

Because federal efforts related to EO 13616 are still in their preliminary stages as of early 
2016, most details regarding how government and commercial industry plan to implement 
and manage Dig-Once and related policies are not yet resolved. Timelines laid out by the 
BOC aim to resolve most details and begin implementing practices by the end of 2016. 
Regardless of their eventual form, federal efforts in this domain will only serve to increase 
the complexity of the national networking landscape, accelerating the need for improved 
understanding of both long-haul and lower level topologies. 
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2.2.5 Resiliency 

The driving force for improved understanding of physical networks from a national secu¬ 
rity perspective centers around resiliency. With the increased dependency of vital services 
such as the financial, medical, energy, and transportation industries on network connectiv¬ 
ity, disruptions have potentially disastrous ramifications. Over a sufficiently large period 
of time, a certain number of localized disruptions from man made or natural sources is 
inevitable. This forces government overseers and commercial providers to avoid working 
toward a perfect design in favor of one that can sustain damage and dynamically adapt to 
minimize downtime. 

While traffic consolidation is an effective business strategy for scaling up network capa¬ 
bilities while maximizing profit, it comes at a price. When network traffic is constrained 
to a limited number of physical locations, infrastructure disruptions can produce greater 
outages than a more decentralized topology. During research for his book on the physical 
Internet, author Andrew Blum [15] visited a number of these locations, remarking at one 
that: 


This [room] was the main access point for Milwaukee’s municipal data 
network, connecting libraries, schools, and government offices. Without it, 
thousands of civil servants would bang their computer mice against the desk 
in frustration. All this talk about Homeland Security, but look what someone 
could do in here with a chainsaw. 

Damage to vital network infrastructure does not just come from malicious actors. In 2001, 
a CSX freight train derailed in Baltimore’s Howard Street tunnel [16], causing a massive 
fire that burned for hours despite extensive efforts by emergency response personnel. In 
addition to causing property damage, the crash and subsequent fire severed a FOC con¬ 
duit carrying Internet traffic from several providers as well as a large telephone FOC line. 
Although Internet access was largely unaffected in many Washington, DC areas, traffic be¬ 
tween DC and west coast locations such as San Diego slowed by up to a factor of 10 in 
some locations. In order to restore redundancy, a team of telecommunications workers and 
city officials had to excavate the street in four locations to clear blockages and route 24,000 
feet of FOC through manhole accessible conduits over 36 hours. 
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Natural disasters also pose a threat to networks that lack resiliency and redundancy. A 
Federal Communications Commission (FCC) independent review panel [17] of Hurricane 
Katrina’s effects on communications networks identified line cuts and a lack of redundant 
pathways as two causative factors in the substantial outages accompanying the storm. One 
example from their findings was a long-haul FOC conduit with a tandem switch inside New 
Orleans and paths out of the city to the east and west. After the eastern route was cut by a 
barge blown ashore, the western route was cut first by falling trees, and later by construction 
crews removing debris from a highway ROW. Damage to a small number of switches in 
New Orleans impacted traffic both inside the city and on conduits linking regions of the 
country. Accidental fiber line cuts by clean-up and response teams were so prevalent that 
BellSouth reported major routes cut in multiple places, and Cox Communications estimated 
that 11 days after the storm it had suffered more network outages due to human damage 
than the storm itself. 


2.3 Physical Topology Mapping History 

While many details remain unanswered, physical topology mapping research is not without 
its past efforts. Since the early 2000s, many research groups and private companies have 
attempted to make progress, with substantial but still limited successes. Most research ini¬ 
tiatives in this area fall into one of two categories. Measurement-based projects attempt to 
directly calculate results, normally by sending probes to certain destinations and timing the 
responses while trying to compensate for errors induced by propagation, queuing, and vir¬ 
tualization. Compilation-based projects rely on seeking out preexisting data from different 
sources that independently offer little insight, but by gathering them together and analyzing 
them, yield new results. 

Many research projects addressing physical topology mapping are not fully applicable to 
the problems projects such as net.Tagger seek to address. Most work focuses on IP geolo¬ 
cation, which seeks to identify the rough geographical position of individual IP addresses 
or IP subnets. IP geolocation has many commercial applications including targeted web 
advertisements, fraud protection, and determining the applicability of interstate or inter¬ 
national laws [18]. However, conventional IP geolocation suffers from two shortcomings 
regarding physical topology mapping. First, the level of accuracy is normally too low. 
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Even commercial geolocation services are usually limited to placing IP addresses within 
a given zip code or greater, which is insufficient for constructing fine-grained maps [19]. 
Second, much of the desirable infrastructure targeted by researchers for mapping exists 
below the IP layer [6]. The physical infrastructure sought by this project and other similar 
ones cannot be completed by simply identifying the probable locations of router or higher 
level architecture. 


2.3.1 Measurement Based 

One approach to network topology mapping that has been studied and expanded upon for 
years uses a variety of probes and timing measurements to roughly geolocate individual 
IP addresses and small subnets. These methods employ a number of “vantage points,” 
consisting of servers (such as PlanetLab nodes) at precisely recorded coordinates to send 
probes to target hosts. The propagation delay of FOCs is relatively fixed at 2/3c, which 
increases to 4/9c when factoring in transmission, processing, and queuing delays. 

The most basic implementation of timing-based geolocation was used by early implemen¬ 
tations such as GeoPing, which made the observation that if the Round Trip Time (RTT) 
between two known hosts was similar to the RTT of one of the known hosts and an un¬ 
known target host, there was a tendency for the two to be geographically clustered [20]. 
These techniques relied on a large number of assumptions that their authors readily admit¬ 
ted, but they represented some of the first efforts into IP Geolocation in the early 2000’s. 
Accuracy with this basic implementation was limited, with GeoPing requiring 7-9 probe 
sources to achieve an accuracy in the lOO’s of km. 

Fortunately, the past 10-15 years has seen a number of improvements. One of the most 
important was the publication of CBG in 2004 [21]. Unlike earlier methods that could only 
produce a discrete number of possible positions equal to the number of reference hosts, 
CBG is capable of using multilateration to place a target host in a probable region that may 
not include any of the reference hosts. Despite representing a substantial improvement with 
room for growth, CBG is effectively limited to a median accuracy of 228 KM. Combining 
CBG with high-level knowledge of ISP topology gained through other sources resulted in 
the creation of Topology Based Geolocation (TBG), with an improved median accuracy of 
67 km [22]. Further augmentation with knowledge of router locations and demographics 
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data permits tools such as the Octant framework to achieve a median accuracy of 35.2 
km [23]. While research continues to improve IP geolocation to the point that it may be 
used for limited topology discovery [24], it still suffers from the shortcoming of targeting 
too high a level of the Internet too inaccurately to produce the fine-grained, low-level maps 
that would prove most beneficial to researchers. 

Another IP-level geolocation method that augments timing-based approaches is DRoP. 
DRoP takes advantage of common naming trends within the Domain Name System (DNS) 
protocol, which maps human readable domain names to network addresses. Although no 
official standard naming convention exists for DNS, the hostnames of router interfaces can 
include descriptive keywords selected by the infrastructure’s owner to assist the organiza¬ 
tion and administration of their assets. Frequently, at least some of this information will 
include geographical hints about a location holding the physical infrastructure pointed to 
by a DNS entry. Most are fine-grained to the city level. Common examples include 

• lATA/ICAO codes identifying the largest airport in a city. 

• CLLI position codes carrying varying levels of geographic resolution, normally trun¬ 
cated to city/state for domain names. 

• UN/LOCODE, identifying specific locations of locations relevant to the shipping and 
manufacturing industry. Developed for European commerce. 

• City names or abbreviations. 

However, utilizing hostname hints for geolocation is far from straightforward. Many host- 
names contain multiple pieces of information that could be interpreted as data with no 
way to determine if the hostname owner chose any to describe the item’s location. An 
example given by Center for Applied Internet Data Analysis (CAIDA) is the hostname 
ccr.par01.atlas.cogentco.com, which potentially contains a Connecticut airport code (ccr), 
a reference to Paris (indeterminate country), or a possible reference to Salas Atlas in Spain. 
All hints point to different locations, and the hostname alone does not give sufficient back¬ 
ground on the holder’s naming convention to say if any is correct. Despite these ambigui¬ 
ties, DRoP hostname data can still provide useful insights. One approach is to group hints 
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based on their domain level (inferring possible similarities in naming sehemes) and then 
eheek possible guesses against timing-based measurements to enaet eonstraints based on 
latency data. Combining timing measurements with DNS hostname has the potential to 
provide accuracy down to the level provided by the hostname hint (usually the city con¬ 
taining the interface), however DRoP is ineffective if an interface lacks a Fully Qualified 
Domain Name (FQDN) or if nothing in the hostname matches a known hint. Previous work 
places the number of router interfaces that cannot be classified with DRoP at approximately 
45%. 

Combining measurement and compilation methods can infer additional relationships be¬ 
yond geolocating individual network nodes. Giotsas et al. demonstrate a method for map¬ 
ping AS peering connections to facilities that makes use of several geolocation methods. 
They begin by manually compiling a database of facilities such as IXPs and the networks 
present at them. This information can be gathered primarily through self-reported data 
published by the facilities to advertise the networks they support to peer with. 

2.3.2 Compilation Based 

Another approach to physical topology mapping relies on gathering data from existing 
sources. Even though central repositories of topology data are not readily available, focused 
subsets do exist. One source of data are the maps published by Tier-1 ISPs themselves. ISPs 
frequently distribute rough maps of their central FOC graphs as commercial promotions to 
demonstrate to potential clients the scope of their coverage. These maps provide a general 
survey of their routes, but they frequently omit router-level detail, as ISPs consider such 
information proprietary. Researchers who utilize them also observe that these maps are 
sometimes optimistic, over-simplified, or out of date. 

Tier-1 ISP maps are still of use to researchers as a starting point. Some projects have 
successfully started with ISP maps and fleshed out smaller details through clever use of 
other data sources [6]. A 2015 project [6] combined ISP maps with geocodings from 
the Internet Atlas Project [2] to create a base map. The researchers then exhaustively 
gathered public domain information such as govemment/municipality records, commercial 
entity documentation, utility ROW, environmental impact statements, and fiber sharing 
arrangements from states’ Departments of Transportation (DOTs). Through extrapolation 
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and cross-correlation, the team was able to produce a number of conclusions about the 
state of long-haul FOC infrastructure and the sharing agreements implemented by ISPs on 
the physical level. Provided the underlying documentation and extrapolation assumptions 
are correct, mapping efforts like these provide a valuable counterpart to the error-prone 
measurement based techniques. However, the quantity and variety of documentation used 
for these projects makes validating their accuracy infeasible. They also tend to focus on 
larger Internet backbone infrastructure because their methods and the documentation they 
rely on do not accurately scale down to more fine-grained levels. 

Another area of compilation-based network mapping with a much more established history 
is that of submarine communications cables. A successor to submarine telegraph and tele¬ 
phone cables, modem submarine FOC cables carry the majority of transcontinental Internet 
traffic. Because of their crucial role in connecting countries to the global Internet backbone, 
submarine cables are considered by many governments as vital national assets. However, 
submarine cables are frequently subject to damage due to natural phenomena such as ocean 
current and earthquakes or manmade sources such as anchors, trawling nets, or intentional 
sabotage. Their importance, vulnerability, and relatively low numbers make submarine ca¬ 
bles a sought-after mapping target by telecommunications research firms who sell maps 
and data to a variety of customers. Various free sources exist such as TeleGeography’s 
interactive online Submarine Cable Map [25]. However, most free maps are deliberately 
designed with a low level of detail. TeleGeography’s free product is “stylized to improve 
readability” and “does not reflect the physical cable location.” Its cable landing stations are 
also “not precise coordinates” and “are meant to serve as a general guide.” More descriptive 
maps and datasets are available from these sources but come with expensive subscription 
fees and licensing restrictions on use. 


2.4 Crowdsourced Mapping 

Much of the initial inspiration for net.Tagger came from the success of crowdsourced map¬ 
ping projects, the most notable of which is the OSM project [26]. OSM is a worldwide 
initiative with its origins in Europe, officially supported but not managed by the OSM 
Foundation. Its goal is to provide a freely available, open source collection of GIS data. 
Often described as the “Wikipedia of Google Maps,” OSM has over 2.4 million registered 
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users [27] submitting data. OSM users gather data through different means and submit 
their findings to OSM using one of many available web, desktop, or mobile editor ap- 
plieations [28]. Most of the editors are ereated through eommunity projeets with OSM’s 
publiely available editing Applieation Programming Interfaee (API) and provide experi- 
enees designed for subsets of the user base. Although many different options exist for 
users to interaet with the OSM data set, the three most popular editors are iD, Potlateh2, 
and JOSM. iD and Potlateh2 are both browser based editors available direetly from the 
main OSM website’s planet map. They permit users to tag and edit as they interaet with 
a map populated from the entire OSM dataset. Potlateh2 is an older editor that requires 
flash browser support, however it offers more features than iD and is still widely used. iD 
is javaseript based and is designed for more noviee users, with an emphasis on simplie- 
ity. JOSM is a standalone desktop applieation designed for experieneed users, providing 
eustomizability through plugins and a broader feature set at the priee of a steeper learning 
eurve. JOSM allows users to input large data sets offline, automatieally validate for eom- 
mon errors, and then push edits to the OSM dataset when finished. Although these three 
editors are the most eommon among the OSM eommunity, many other open souree editing 
applieations exist that make use of OSM’s editing API. OSM’s editor doeumentation [28] 
eurrently lists seven editors apieee for android and lOS deviees. The smartphone editors 
vary in eapability and intent. Some are designed for other Geographieal Information Sys¬ 
tem (GIS) purposes and offer limited ability to push edits to OSM, while others are fully 
feature editors eapable of submitting all types of OSM objeets from field loeations. Af¬ 
ter OSM reeeived permission to overlay satellite images from sourees sueh as Bing Maps 
over its existing tiles, users beeame able to visually identify and traee out features on these 
applieations without needing to eonduet field surveys. 

Beeause OSM relies on the assumption that users will vet data before submitting, most of 
their data error eome from inadvertant user mistakes or intentionally plaeed eopyright easier 
eggs [29]. The offieial OSM wiki [30] addresses this issue by noting that even proprietary 
data sourees have errors ineluding intentional “eopyright easier eggs.” It also diseusses the 
“wikipedia-style model” the projeet follows, where eaeh user ean add history/submission 
metadata to their profile’s uploads. OSM elaims that beeause most users are deliberate in 
their methods and non-malieious, the eolleetion of eorreet data points is substantially larger 
than the few ineorreet ones, and overlap between user submissions will quiekly identify and 
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correct small errors. 


Formal analysis of OSM data shows that these claims are reasonably correct, with several 
caveats. One study [31] compared formal geographical survey data against data from OSM 
and Tele Atlas, a commercial GIS supplier to many projects including early version of 
Google Maps. Analysis found that both OSM and Tele Atlas deviated from the survey data 
with similar spacial deviations. However, OSM showed greater inaccuracies in rural areas, 
where the study deduced that there were fewer users than urban areas where the OSM error 
rate was comparatively lower. Another study [32] found that the majority of high qual¬ 
ity OSM submissions came from a core group of “Expert” to “Professional” level users 
comprising only 3-4% of the OSM user base, with an accuracy approaching or at the level 
of commercial agencies. The lowest levels of participation and submission quality came 
from the approximately 74% of “Beginner” users. In addition to user-submitted findings, 
OSM utilizes imports from many other open GIS repositories [33] with the permission of 
the owner, providing a foundation of data from a multitude of sources, many of which 
were professionally gathered. The OSM dataset is used by private citizens, companies, and 
government agencies for web, desktop, and mobile applications. Proprietary GIS datasets 
potentially come with licensing fees. Terms of Service (TOS) agreements, and privacy poli¬ 
cies that are incompatible with the fiscal resources or ideological viewpoints of application 
developers or their userbase. Because of its open source philosophy, OSM is free to use 
and under the Open Data Commons Open Database License, has a very liberal use policy 
that only requires attribution to the OSM project. By contrast, most Google Maps API de¬ 
veloper tiers permit small-scale usage for free but begin charging an owner once registered 
applications using their API key exceed 25,000 queries per day. Although Google offers 
Quality of Service (QoS) guarantees and additional support with its higher priced tiers, 
many small open-sourced projects requiring a GIS dataset are minimally funded and utilize 
the expertise of its user base for technical support. For them, relying on proprietary systems 
is infeasible, and OSM data combined with free GIS software libraries allows them to de¬ 
velop at minimal cost. As a result, small independent developers have produced a plethora 
of OSM reliant applications from smartphone navigation apps to online search engines for 
National Park campsites. OSM is also utilized by government and Non-Government Or¬ 
ganizations (NGOs) for crisis mapping. After the 2010 Haiti earthquake decimated large 
swaths of the country, rescue teams were hindered by the lack of accurate, up-to-date maps. 
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OSM volunteers began recording roads based on available Yahoo imagery. Other volunteer 
teams deployed to Haiti itself to begin mapping with OSM techniques. The end result was 
a highly detailed GIS resource that quickly became the default map for all NGOs in the 
area as well as other responding organizations such as the United Nations and the World 
Bank [34]. 

Crowdsourcing has also been applied to networking projects with success. The Portolan 
project [35], a collaboration between Italian research entities including the University of 
Pisa, is one such example. Portolan employs a distributed smartphone app framework 
similar to the one proposed by us for net.Tagger. It seeks to build maps of mobile device 
signal coverage and AS-level connections by collecting a combination of passive and active 
measurements from smartphone sensors. The Portolan app utilizes geolocation measure¬ 
ments from other onboard phone applications to minimize battery use, correlating them 
with time-synchronized measurements of phone signal strength [36]. The app also per¬ 
forms traceroutes to target locations after receiving periodic instructions from a central 
command and control server that also collects and stores data. Portolan’s creators identi¬ 
fied a streamlined and minimal user experience, low smartphone resource footprint, and 
providing users with access to a partial results dataset as their main design goals to encour¬ 
age user participation [37]. They selected Android as their initial deployment platform, 
citing an overall ease of development and distribution that outweighed the difficulties in 
implementing networking algorithms such as Paris Traceroute. Preliminary analysis of 
Portolan research results showed consistency against a CAIDA traceroute dataset and even 
several cases where traceroutes from smartphones employing the app traversed routes in 
the opposite direction as the CAIDA traces, uncovering new router interfaces. Although 
Portolan is still in its infancy relative to its developers’ eventual objectives, it demonstrates 
the utility of performing crowdsourced, smartphone-based network measurements. 


2.5 Infrastructure Indicators 

net.Tagger’s basic approach to physical topology mapping relies on a user’s ability to iden¬ 
tify street-level indicators of telecommunications infrastructure. This presents two chal¬ 
lenges: First, users may not have previous experience in spotting infrastructure, and sec¬ 
ond, most infrastructure is hidden from view and can be identified only through indirect 
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indicators. Most indicators available to common observers signal the presence of FOC. 
Exceptions exist, but most sensitive equipment such as routers or server racks are secured 
on private property owned by ISPs. However, the connections between these entities often 
pass through public space, and must have some means for their owners to access them to 
perform maintenance. They also must be marked clearly enough that other contractors or 
utility providers do not inadvertently damage them during construction or operations. Pub¬ 
licly available information on telecommunications markings is limited, but a combination 
of public utilities publications and field research performed for this project has revealed the 
following targets of interest for net.Tagger. 


2.5.1 Orange Markings 

One of the most prevalent and reliable street-level indicators of telecommunications equip¬ 
ment relies on the public utility color-coded system. The system is maintained and pro¬ 
moted by the American Public Works Association, a non-profit professional organization 
including both public works agencies and private sector companies who work in the field. 
The American Public Works Association (APWA) Uniform Color Code [38], laid out in 
ANSI standard Z535.I: Safety Colors for Temporary Marking and Facility Identification 
(see Figure 2.1), is not absolutely binding but is followed by most agencies throughout the 
country for conformity reasons. The purpose of the APWA Uniform Color Code is to stan¬ 
dardize the markings public utility agencies and companies use to identify and warn each 
other of the presence of underground infrastructure based on type. 
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APWA UNIFORM COLOR CODE 

FOR MARKING 

UNDERGROUND UTILITY LINES 



WHITE • Propo^d Excavation 

PINK • Temporary Survey Markings 

RED - Electnc Power Lines, Cables. 
Conduit And Lighbng Cables 

YELLOW - Gas, Oil, Steam, Petroleum or 
Gaseous Materials 

ORANGE - Communication, Alarm Or 
Signal Lines, Cables Or Conduit 

BLUE - Potable Water 

PURPLE - Reclaimed Water, Irrigation And 
Slurry Unes 

GREEN - Sewers And Drain Lines 


Figure 2.1: Street Markings Color Code. Source: [39] 


The most relevant color entry for net.Tagger’s work is orange, specifically color shade PMS 
144. Bright orange markings laid in paint or chalk on roads, sidewalks, or other public 
spaces in the United States are usually a sign that telecommunications equipment is present 
below ground. This can include phone lines, cable TV, or fiber-optic cables. The markings 
vary greatly in style depending on the project, but will frequently be drawn with lines or 
arrows indicating the direction of travel of the cables. Many have amplifying information 
including the ISP who owns the equipment and what their particular use is. Figure 2.2 and 
Figure 2.3 show examples. 
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Figure 2.2: Orange Marking 


Figure 2.3: Orange Marking 



Even though phone and eable TV lines are not of primary interest to the net.Tagger projeet, 
multiple types of eables are often run together to eeonomize on spaee, thus any orange 
markings are a desired find. Even better are markings carrying the initials “EOC,” indicat¬ 
ing fiber-optic cables. If a marking specifically states fiber-optic, there is a higher proba¬ 
bility it carries network traffic instead of other services. Assigning this higher certainty to 
a find creates a more useful data point for later topology extrapolation. 

One other street marking color of lesser significance to net.Tagger is white, indicating “pro¬ 
posed excavation.” Because white does not specify if the excavation is for telecommunica¬ 
tions work or other purposes, white markings alone are useless for net.Tagger. However, the 
field research conducted for this project frequently found white markings that were covered 
over by orange, suggesting that excavation occurred and telecommunications equipment or 
cabling was installed. This can provide a potentially useful data point regarding the recency 
of the find. It is important to note that these criteria do not apply outside of the U.S., where 
different color codes are used. Eor example, in the UK, telecommunications equipment is 
identified with the color green, which in the U.S. indicates sewers and stormwater systems. 


2.5.2 Duct Markings 

Orange street markings come in a variety of shapes depending on their intended use. One 
subset of orange markings is of special significance because they indicate a duct carrying a 
bundle of telecommunications cables. Duct markings also have several forms they can take. 
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but most consist of several parallel lines or parallel lines boxing in a diamond as shown in 
Figure 2.4 and Figure 2.5. 
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Figure 2.4: Duct Marking Figure 2.5: Annotated Duct 

Marking 


Frequently duet markings will be annotated with the width of the duet (sueh as “24 ineh 
FOC duet”). The personnel laying down duet markings will usually string together sev¬ 
eral markings in a line, indicating the exaet loeation of the eommunications channel. Duet 
markings have the benefit of identifying a greater than usual eoneentration of teleeommu- 
nieations infrastructure as well as exaetly where it leads to, giving valuable information to 
prospeetive mappers. 


2.5.3 Manhole Covers 

Aceompanying temporary paint or chalk markings are more permanent infrastrueture indi¬ 
cators that serve as access points to equipment for maintenance personnel. The largest and 
most prominent examples are manhole eovers. Although many manhole eovers in an urban 
area provide sewer access, others are devoted to accessing teleeommunieations equipment. 
Unlike sewer aeeesses which are marked with “Sewer” or “S,” telecommunications man¬ 
holes will bear the name of the provider who operates their underlying equipment. Most 
will also bear a unique, distinguishable honeycomb pattern visible in Figure 2.6 and Fig¬ 
ure 2.7, but other categories of manhole eovers (sueh as those used for accessing power 
equipment) might also have this pattern. In addition to the middle of streets, teleeommuni- 
cations manholes ean be found on sidewalks and in the middle of traffie interseetions next 
to sewer aeeesses. 
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Figure 2.6: Bell System 


Figure 2.7: US West 


Manhole covers do not provide as detailed information as other sources, but they still iden¬ 
tify the presence of telecommunications infrastructure at a location. The operator name 
that they provide is also useful data, however the markings do not necessarily reflect the 
current owner if the original owning company was bought or sold. 


2.5.4 Handholes 

A less prominent, but often more descriptive maintenance access point, is the handhole. 
A smaller cousin to manholes, handholes are usually found on sidewalks and are much 
smaller, only providing enough room for a technician to reach inside instead of enter 
entirely. Similar to manholes, handholes might be used for different equipment such as 
power or water meters. Telecommunications handholes can be marked with the name of 
their equipment owner, but often bear descriptive names as well (Figure 2.Sand Figure 2.9). 
Some are stamped with their specific purpose (“Broadband,” “Cable,” or even “Computer”). 
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Figure 2.8: Communication 
Handhole 



Figure 2 . 9 : Computer Handhole 


Others are even larger, approaehing the size of manhole eovers and bearing additional in¬ 
formation such as the ratings of the equipment they protect. Figure 2.10 and Figure 2.11 
both demonstrate equipment ratings labels. 



Figure 2.10: Fiber Optic 

15/20K 


Figure 2.11: SBC NewBasis 
20K 


Handholes provide similar information as manhole covers, with the occasional bonus of 
amplifying information. 

2.5.5 Dig Warnings 

The infrastructure indicator that most non-technical persons are familiar with are “Call 
Before You Dig” signs erected to warn landscapers, homeowners, and contractors about the 
presence of buried hazards such as gas lines. Telecommunication dig signs can frequently 
be found along roads and are usually small green or gray columns with an orange sign 
stating “Warning: Underground Cable. Dig Safely” and giving the name of the provider 
managing the cable. Figure 2.12 and Figure 2.13 show different dig warnings on similar 
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columns. 
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Figure 2.12: Qwest Warning 


Figure 2.13: Century Link 
Warning (Close-Up) 


Although dig warnings might seem to provide a limited amount of information, they some¬ 
times permit helpful data extrapolation. Beeause FOCs usually (but not always) follow 
roads, a string of dig warnings along the same seetion of main road labeled with the same 
provider name is a strong indieator of the direetion the eable lies in. 


2.5.6 Cell Towers 

Some eell towers are easily identified by by signage plaeed on surrounding feneing that lists 
operator names and the tower’s FCC identifieation number. Figure 2.14 shows a standard 
eell tower base with its aeeompanying labelling. 
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Figure 2.14: Cell Tower Markings 


Others are deliberately eoneealed to blend in with loeal landscapes and features. In Fig¬ 
ure 2.15, a cell tower has been disguised as a tree, although its distinctive base is still 
visible. 



Figure 2.15: Hidden Cell Tower. Source: [40] 
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This practice allows providers to place infrastrueture in elose proximity to urban areas, 
however loeal residents sometimes file lawsuits over supposed health effeets [41]. Online 
communities [42] exist devoted to eataloging examples of cell towers in a variety of dis¬ 
guises ranging from eaeti to ehureh steeples. While some are fully coneealed, others are 
still surrounded with standard fencing and FCC markings that can be easily identified by 
a nearby observer. Even though the cell tower in Figure 2.15 is disguised as a tree, its 
distinctive base is still visible. Figure 2.14 shows a different tower that is not eoneealed, 
demonstrating the full range of labels that might appear. Cell towers are useful in mapping 
because they are frequently conneeted to sizable ground FOC lines. Searching the roads 
and trails surrounding a eell tower usually leads to diseovery of other infrastrueture indi¬ 
cators in the immediate vieinity. Cell towers represent a useful location to begin a fresh 
seareh for infrastrueture and ean be good jumping off points for further investigation. 

2.5.7 Buildings 

Buildings holding aetual infrastructure equipment such as servers, routers, or data storage 
are diffieult to identify beeause they are usually well-seeured on private property and un¬ 
marked. In the event that following FOC trails leads to identifiable ISP properties, a very 
useful mapping association is made. net.Tagger allows users to submit building findings in 
the event that a possible building is identified due to the potential value of the find. 


2.6 Android Platform Capabilities 

The net.Tagger concept relies on a distributed network of smartphones that can individu¬ 
ally collect and submit research data. We utilize Android for our initial development and 
release. In addition to eomments and other data that users can enter manually, the platform 
provides the following eapabilities. 

2.6.1 Location Data 

Android currently offers two loeation APIs. The first is the stoek Android.Focation API 
[43], which is still supported, but in the proeess of being phased out. Google recommends 
developers utilize the newer Google Play Foeation Services API [44], which requires reg¬ 
istration with the Play Store but offers better performanee, aeeuraey, and battery usage. 
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Either API can be interfaced with the Google Maps API, which requires additional regis¬ 
tration but permits an app to directly display location overlays. Developers can configure 
“Location Listeners” at runtime that dictate how frequently and precisely the app performs 
location updates, trading accuracy for battery usage. 


2.6.2 Sensors 

So long as its underlying hardware supports all sensors, an Android smartphone app can 
collect raw data from many types of sensors [45]. Not all devices will contain all possible 
sensors, and some devices may contain multiple instances of the same sensor that have 
different levels of precision. The Android sensor management packages provide tools for 
an app to determine which sensors exist on a device, what capabilities those sensors have, 
and how to register and read from the sensors. Examples of Android sensors [45] include: 


Motion Sensors 

Gyroscopic, accelerometer, and rotational vector sensors that can measure rotation and 
translation in all three spatial dimensions. 


Environmental Sensors 

Barometers, thermometers, and photometers that can measure humidity, atmospheric pres¬ 
sure, temperature, and illumination. 


Position Sensors 

Orientation sensors and magnetometers that measure the physical position of a device. 


2.6.3 Camera 

Although the Android Camera API permits fine-grained control of any onboard cameras, 
it also provides built-in tools to use basic camera features with minimal effort. Android 
documentation recommends that developers determine the role that image collection plays 
in their project and utilize these pre-existing tools unless their app requires a custom camera 
configuration. The Camera API permits developers to integrate the stock camera UI that 
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all users are familiar with into their apps, which reduces the possibility of user error or 
stability issues accidentally introduced by developers. 
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CHAPTER 3: 

mplementation 


3.1 Project Requirements 

The core goal of the net.Tagger project is to obtain GIS data and descriptions of street- 
level network infrastructure indicators in sufficient quantity and detail to infer accurate in¬ 
sights about underlying network topology. net.Tagger will pursue this goal via a distributed 
crowdsourcing approach that is easy and fulfilling for the project’s user base. Crowdsourc¬ 
ing will be implemented via a mobile app. For our purposes, we consider an app as a 
program running directly on a mobile device’s operating system [46]. This is in contrast to 
software running on a dedicated computer or through a web browser. Core project require¬ 
ments (in no particular order) are: 

• The overall app experience should be as streamlined as possible to minimize user 
frustrations, reduce the app’s learning curve, and increase the likelihood of a user’s 
continued involvement in the project. Most users who seek to become involved 
will possess some networking knowledge, however their initial unfamiliarity with 
net.Tagger and the project’s target data must be overcome to produce productive 
users. A straightforward user experience will lower barriers to entry and reduce op¬ 
portunities for a user to execute the submission process incorrectly. Similar to OSM’s 
crowdsourcing process, our project model contains a possibility that users will mis¬ 
interpret findings or improperly perform submissions. A simply, streamlined user 
experience introduces fewer opportunities to perform an erroneous action. Overall, 
the app should be able to move a user from identifying a finding to submitting a 
data point in the fewest number of interactions (such as clicking or entering text) as 
possible. 

• The app must send enough data on a tag submission to provide a useful data point. If 
the ultimate goal of net.Tagger is to infer meaningful and accurate network topology 
data, certain key pieces of information are necessary for each submission. At a mini¬ 
mum, a “tag” is a single transaction sending Geographical Positioning System (GPS) 
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coordinates, the GPS accuracy at time of submission, a timestamp, and the user’s 
belief about the infrastructure’s type and provider. The user must also be encouraged 
to submit images and any miscellaneous observations, providing extra resources for 
net.Tagger researchers to verify submission accuracy and make network inferences. 

• The app must provide users with text or graphical feedback immediately after sub¬ 
mitting a finding. The feedback will ensure that users see that their action completed, 
keeping them invested. 

• The app experience must provide users with incentives to continue participating. A 
multifaceted approach should be employed to reach users with different motivations. 
These can include community prestige through an online leaderboard, small mone¬ 
tary rewards, or providing access to a portion of the dataset in exchange for partici¬ 
pating. These incentives should be tailored to improve the quality of research data, 
such as providing additional rewards for validating existing tags from other users 
instead of just submitting original tags. 

• The app must operate reliably, handling errors properly, and avoid crashes. Stability 
issues are likely to induce frustration in users, leading to reduced participation or 
quitting the project altogether. 

• The app must balance user privacy, data security, and overall usability. The app 
should maintain a unique profile for each user used to identify and authenticate their 
data submissions, but limit required user information to that necessary for research 
purposes. No information should be collected without the user’s knowledge and 
consent. 

• Data submitted by users must be protected during submission (“in transit”) and in 
storage (“at rest”). Data must be secured in transit against an adversary capable of 
intercepting cellular signals or sniffing network traffic. Data should be stored on 
servers we control, and in a manner that is resistant against web and database attacks 
(such as SQL injection). No services or databases should be not be exposed beyond 
what is necessary for approved client/server operations and additional access must 
require administrator credentials, secure. 

• Data should be logically ordered in order to facilitate indexing, retrieval, and inter¬ 
facing with standard GIS tools such as the OSM software stack. This does not affect 
the data collection process, but is required for the eventual data analysis that is the 
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core goal of net.Tagger. Because the eventual dataset will be very large, it must be 
stored in a format that can be efficiently queried based on parameters and constraints 
via native PostGIS functionality, scripts, and GIS software. 

• At a minimum, users should be able to view their own tag history directly from the 
app. Ideally, users should also be able to view the entire set of tags both from the app 
and online if resources permit. 


net.Tagger’s design requirements were chosen to support two approaches to user data col¬ 
lection. As the OSM project demonstrates [32], the most accurate and complete data will 
likely be submitted by a small, core section of users. This group will likely possess greater 
than average technical knowledge and a willingness to devote blocks of time and effort 
specifically to collecting data. These users will be interested in submitting findings that are 
not only accurate, but also as complete and informative as possible. If the app offers extra 
functionality, they are likely to learn and use it properly. They will also be concerned with 
their search coverage, canvassing as large an area as possible without missing or repeating 
sections. 

Similar statistics on OSM users shows that a larger proportion of users will contribute less 
frequently and with a higher chance of submitting incorrect or incomplete data. These users 
will benefit from a simple experience that requires a minimal amount of time and number of 
interactions to submit tags. Their submissions are likely to be made while conducting other 
activities, making convenience and usability key to their continued participation. They do 
not require complex features, as they are less likely to take the time to learn and use them 
regularly. 

Most users will not fall explicitly into one of these two groups, but will use a combination 
of both methods depending on their lifestyle. A user might perform detailed, structured 
data collection for several hours on a weekend but also submit findings as they come across 
them during weekday activities. To capitalize on its user base, the app must cater to both 
methods. The UI and user experience must be streamlined enough for quick and intuitive 
submissions, while still allowing users to track their past submissions and provide addi¬ 
tional details when they have the time and interest to do so. 
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3.2 App Design 


3.2.1 Initial App Design 

During its original development, the net.Tagger app foeused on funetion over form. As the 
projeet evolved and received input from reviewers, several UI necessities became apparent. 
Initial iterations of net.Tagger were structured as follows: 


• The user began on a “main screen” (Figure 3.1) that linked to pages such as profile 
data, data submission, instructions/examples, and a display of past submissions. 

• After setting up a profile and viewing the training pages, the user spent most time on 
the data submission page (Figure 3.2) to submit findings. 

• To receive any feedback beyond a “Data Submitted” message, the user needed to take 
several extra steps that brought them out of the submission cycle. 


OEllHl*!® 5:17 


Welcome to the CMAND net.Tagger 
Application. 


Getting Started 
Instructions/Examples 
Submit A Finding 
View Findings 
Settings 


d2:3 CZI 


I? n KI 0 tS. .,1 89 %i 10:43 AM 


36.59588534 
-121.88226186 
Infrastructure Type 
Manhole 

Infrastructure Provider 
Bell 

Pacific Bell next to fiber markings 


SEND DATA (NO PICTURE) 
SEND DATA (TAKE PICTURE) 



Figure 3.1: Initial Main Screen 


Figure 3.2: Initial Submit 

Screen 


The layout was not conducive to a positive user experience and was likely to foster disin- 
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terest and frustration. The barebones prototype was adequate for initial development, but 
did not meet all design requirements. 


3.2.2 Refined App Design 

• The main page (Figure 3.3) that the user “lives in” was changed to include the sub¬ 
missions map. This ensured that the user constantly sees their previous tags and is 
immediately shown the result of a tag submission as a new map marker. The user 
can also watch their position marker move around the map filling in blank spaces 
with fresh findings. This provides constant feedback without moving to a fresh app 
screen. 

• Tasks such as submitting data, modifying profiles, and viewing infrastructure indica¬ 
tor examples are moved to pop-up activities that display off of the main app screen 
(Figure 3.4). The user does not have to click through multiple screens to accomplish 
basic tasks, reducing time away from the main screen. All interactions take place 
from a single, central screen. 
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Figure 3.3: Refined Main 

Screen 


Figure 3.4: Refined Submit 
Screen 


Figure 3.3 shows a user’s main screen after two hours of tagging in downtown Salinas, CA. 

3.2.3 Platform Selection 

Because crowdsourcing depends on reaching the largest possible user base, net.Tagger 
would ideally be developed for multiple smartphone architectures. However, confining 
the project to a single architecture for initial research phases facilitates testing non-app 
components without the substantial workload brought on by deploying on different plat¬ 
forms. A mature project can only be created through continual deployment and testing that 
reveals issues needing resolution. This necessitates choosing a single smartphone architec¬ 
ture for initial app development before porting to others. Early testing before wide scale 
deployment does not rely on reaching a broad user base, placing a premium on platform 
development ease instead of overall market share. After considering available options. 
Android and lOS emerged as the most viable architectures for an initial net.Tagger app. 
Android’s documentation, developer community, open source philosophy, and distribution 
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system made it an ideal development platform. Although either option would have worked 
well, easy integration with tools such as the Google Maps API and the Google Play Store 
reduced many project requirements to previously solved problems. We leave expansion to 
lOS as future work. 

3.2.4 User Interface 

The most important iteration in the evolution of the UI was bringing an emphasis on feed¬ 
back to the forefront of the user experience. Early versions were successful in gathering 
data during local field tests, however the testing was carried out by project members with 
external motivation to continue submitting data. With this configuration, a normal user 
without any explicit ties to the project would be expected to expend time walking around 
urban areas entering data about their finds without receiving immediate feedback beyond 
a “Data Submitted” app message. Most users would quickly grow disillusioned with this 
configuration, feeling they were performing unpaid labor with little incentive to continue. 
A successful crowdsourcing project depends upon users feeling invested in a common goal, 
and the early app UI did not accomplish this. 

Several different solutions to the user feedback problem were evaluated for feasibility ver¬ 
sus payoff. For example, an approach requiring minimal effort would be to run scripts on 
the net.Tagger Virtual Private Server (VPS) to let users download a Keyhole Markup Lan¬ 
guage (KML) record of their submissions to view in Google Earth via a tablet or PC. This 
basic solution permits the user to view submissions, but only after returning from gathering 
data and completing several steps. We posit that a dedicated group of users might be will¬ 
ing to perform these extra tasks to view the results of their efforts, but this might discourage 
more casual users. It also violates our design requirements that emphasize a streamlined 
process with immediate, automated user feedback. 

Another prototyped solution kept a KML file on the user’s phone to record submissions 
locally in addition to sending them to the net.Tagger backend server. After making a series 
of captures from the “Data Submit” page, the user had the ability to return to the main page 
and select a “View Submissions” option. This would launch Android’s Google Earth app 
(assuming the user had it installed on their phone) and load the local KML file, displaying 
the user’s submission history as a series of map markers overlaid on a global map. This 
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approach provided the user with instant smartphone feedback identical to the previous op¬ 
tion. The user no longer needed to download a file and could view their map in between 
submissions, even while gathering data. However, this design had several drawbacks. 

Due to the design of the Android OS, opening Google Earth and populating it with 
net.Tagger data was a trivial task. However, if the user was already running Google Earth 
in the background when they tried to view submissions in net.Tagger, no new data would 
be loaded. As a stopgap, the app displayed a message to the user reminding them to close 
instances of Google Earth before viewing tag submissions. Counting on a user to follow 
extra task direction for a basic feature to work properly is inadvisable and risks frustrating 
users. A good UI design should present immediate feedback within one to two seconds 
every time a user performs a task, particularly a data submission. Although this design 
was an improvement over the initial layout, it still required a user to submit tags from one 
screen, navigate to the main page, leave the app to check the Task Manager, return to the 
app, and select “View Submissions,” opening up an entirely separate app (Google Earth) to 
finally display findings. 

The final UI layout came about after gathering feedback from test users, some of whom 
had prior app development experience. The most important design decision was changing 
the workflow to shift the submission map from a secondary feature to the app’s primary 
focus. All previous iterations of the app required the user to begin at a main page and 
navigate between separate pages to submit and view findings. A streamlined design put 
the submission map as the main page, with the user navigation to other pages through the 
map screen. This was made possible through integration with the Google Maps API. By 
utilizing an Android Map View as the background of the main page, the user’s default view 
is now a map overlay that shows their position and instantly populates itself with markers 
after each submission. An eventual development goal is to populate each user’s in-app 
map with a rough representation of the entire net.Tagger dataset, showing them all covered 
and uncovered regions. However, implementing this feature in the app’s initial release was 
infeasible due to time constraints so a local map of the individual user’s finds was added 
instead. 

Another goal of the final UI was to minimize the time the user spent away from the map 
screen, both in time and “apparent distance.” To achieve this, the other app activities (data 
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submission, profile management, ete.) were ehanged from fully separate sereens to pop-up 
windows aeeessible from the map interfaee. The map becomes the only full screen activity 
in the entire app and is visible in the background during other tasks. This results in a more 
interactive interface, providing immediate and continual feedback. The new layout also 
naturally encourages users to cover a wider area. Lacking an informative layout, users 
might concentrate their search efforts in a single area or accidentally revisit locations. By 
confronting the user with a constant reminder of how their submissions are grouped relative 
to their current location, most users will naturally gravitate to new areas. 


3.2.5 User Training 

Crowdsourcing is a medium that produces reasonable reliable results when applied to tasks 
that do not require specialized knowledge. Burnap et al. [47] applied crowdsourcing to en¬ 
gineering design problems with objectively quantifiable answers to study the effectiveness 
of crowdsourcing for scenarios requiring technical knowledge. They observed above aver¬ 
age results when experts within the participant base were identified and their contributions 
weighted more heavily. However, failing to do so negated most benefits of crowdsourc¬ 
ing because clusters of consistently incorrect participants cancelled out contributions from 
more knowledgeable persons. This suggests that raising the knowledge level of a user base 
should be a priority for technical crowdsourcing projects. Since net.Tagger is available to 
the general populace, excessively relying on a user to make technical decisions increases 
the probability that they will submit incorrect results. Fortunately, net.Tagger users do 
not need to understand most of the networking theory discussed in Chapter 2. As long as 
users are able to identify the infrastructure indicators discussed in 2.5 and understand the 
relevance of utility markings and infrastructure provider names, they will usually be able 
to perform accurate assessments. To train users, the app has a “Training and Examples” 
section (Figure 3.5) that lays out identifying information, sample images, and examples of 
helpful user comments for each infrastructure indicator type. 
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Figure 3.5: Examples Screen 


Additional means of validating submissions are a priority for future net.Tagger researeh. 

While it is inevitable that some level of user misunderstanding will lead to erroneous sub¬ 
missions, erowdsoureing possesses natural error eorreeting meehanisms. Beeause users 
ean only view their own previous submissions and not those of others, multiple users in¬ 
vestigating the same area are likely to tag the same objeet. The set of submissions for a 
single infrastrueture indieator will have several that agree with eaeh other, pointing toward 
the correct data. Furthermore, even if the user is wrong about their submission, the combi¬ 
nation of an image with its GPS coordinates will be enough for researchers to extract some 
level of information. These redundancies reduce the level of training that most users will 
require for the project to collect usable research data. 
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3.3 Backend Services 


3.3.1 Requirements 

Due to the simplieity of the net.Tagger app, most web arehiteetures and frameworks eould 
be adapted to handle and store eolleeted data. As for any projeet, the server side imple¬ 
mentation must be reliable and seeure. Finally, all eomponents must provide appropriate 
GIS eapabilities where needed as well as the means to maintain eompatibility with other 
GIS projeets sueh as OSM. Faetors sueh as datum, map projeetion, eoordinate systems, 
and time zones must be aeeounted for to ensure that the eolleeted dataset ean be eompared 
to and eombined with those from other sourees. Currently, net.Tagger relies on teehnolo- 
gies sueh as Google Maps for most of its GIS data eolleetion and display. However, as 
the projeet eventually moves to other platforms sueh as lOS, net.Tagger aims to shift to 
open souree, platform agnostie tools for tasks such as rendering. The selected architecture 
should be easily migrated to other tools and platforms without requiring extensive redesign. 


3.3.2 Database Selection 

Most GIS projects utilize an SQL-type database to store data. net.Tagger was heavily in¬ 
spired by OSM and is designed to maintain compatibility with it for future research efforts, 
making OSM’s software choices relevant to this project. While OSM does not officially 
endorse a specific software stack, the majority of its users, including the core OSM distri¬ 
bution, relies on a popular GIS add-on to PostgreSQL known as PostGIS. 

PostgreSQL (abbreviated as Postgres) is a powerful Object-Relational Database Manage¬ 
ment System (ORDBMS) compliant with the SQL standards and provides many advanced 
features. While Postgres supports basic geometric data types, it lacks support to handle 
spatial data and transactions. Fortunately, Postgres is designed to be easily extensible. In 
2001, the company Refractions Research released the first iteration of an add-on named 
PostGIS to provide basic spatial types. PostGIS has continued developing new features 
that not only aid data storage, but provide tools for querying and analyzing geospatial data. 
These capabilities extend beyond those available with more conventional GIS storage types 
that are limited in their ability to store accompanying metadata or large data quantities. 

Most OSM users utilize PostGIS in conjunction with the OSM project’s custom GIS for- 
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mats, particularly the OSM XML format and its variants. The OSM XML file format is a 
human readable representation of OSM data. The OSM project hosts free copies of .osm 
files for most countries and states online, including a master planet.osm file, containing all 
collected data the project possesses. At the time of writing, planet.osm is approximately 50 
GB of data compressed, expanding to over 500 GB uncompressed. Since plaintext XML is 
not an efficient storage medium, binary and compressed representations of .osm files also 
exist. For practical use, software packages such as the popular osm2pgsql library exist that 
can receive .osm files as input and insert the bulk data into a PostGIS database. The find¬ 
ings and metadata collected by net.Tagger are not best expressed in the table format used 
by packages such as osm2pgsql, as these combine most metadata into a single “tags” col¬ 
umn that does not permit querying the individual elements. Since most of the metadata for 
net.Tagger such as infrastructure provider or infrastructure type must be able to be queried 
directly, the format is not ideal for this project. Thus, net.Tagger finds middle ground by 
using a PostGIS database that stores appropriate data in individual columns but keeps data 
such as lat/long coordinates in the same format as OSM databases. The project database is 
ideally suited for its specific research needs while retaining the ability to interact with other 
data sources through existing GIS software. 


3.3.3 Scripts 

Server-side processing is performed through a series of PHP scripts. PHP was chosen due 
its ease of deployment, preexisting code body, and user community. While PHP is consid¬ 
ered by some to present security risks when deployed in large-scale, complex web appli¬ 
cations, most reported PHP security flaws are not due to inherent technical flaws but poor 
coding practices. To rectify this, many features exist to perform sensitive processes such 
as password validation or database operations without requiring developers to manually 
implement them and risk doing so improperly. Server operations in net.Tagger are lim¬ 
ited, primarily restricted to user credential validation, receiving GIS data and photographs, 
and performing database storage operations. All these operations are well-understood pro¬ 
cesses with established best practices. Because net.Tagger does not have a web presence 
with complicated user interaction needs, PHP is an appropriate option that fulfills the quick 
development time the project requires. 
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3.3.4 Security Considerations 

net.Tagger was intentionally designed to limit the amount of sensitive data it transmits 
and stores. This limits the security requirements of the project to following best practices 
and using built-in features of its native software packages. All user submissions including 
profile data, tag data, and images, are sent via https POST messages utilizing Android’s 
built-in security certificates. User sessions are recorded and authenticated via session keys 
in keeping with basic web application principles, and user passwords are stored in hashed 
and salted form. Due to a plethora of incidents where PHP developers improperly designed 
their own password handling procedures, PHP now automates the entire process within a 
single function call to store or validate a password, removing room for error. Most impor¬ 
tantly is the decision to limit user metadata. Users are identified via a valid email address 
and their country of origin, limiting the cost of a potential security compromise. As a 
crowdsourcing operation, net.Tagger only requires the ability to track users to the extent 
needed for statistical metrics and the ability to recognize high contributors via leaderboard. 

3.3.5 Scalability 

A successful crowdsourcing operation depends by its very nature on the ability to offer its 
services to a variable number of users. Depending on the size of its objective, the desirable 
number of participants will usually be very large. OSM boasts a sizable user base, with 
usage statistics [27] at the end of 2015 reporting over 2.5 million registered users, with 
over 10,000 actively contributing data weekly and 60,000 monthly. Many of the most 
active users were submitting on the order of several hundred new nodes per day. Even 
more impressively, most reported OSM metrics showed exponential growth over a several 
year period. Because this thesis is intended as net.Tagger’s inception, certain compromises 
must be made in terms of resources and scalability. Its backend services reside on a VPS 
that is capable of handling a reasonable number of app transactions, but would fail under the 
load of larger projects such as OSM. The server’s resources can be scaled up to an extent, 
but operating at a higher scale would likely require a distributed solution. Similarly, the 
architecture choices described earlier place an emphasis on quick development tum-around, 
which does not always result in optimization for large-scale deployment. This project’s 
choices closely mirror the archetypal Linux Apache MySQL PHP (LAMP) stack with a 
minor change to the database component, placing it on par with many other web-services 
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projects. Additional improvements to net.Tagger’s web services will likely accompany 
accompany the project’s expansion. Similarly, the GoogleMaps API key that the app relies 
upon for generating its UI can only manage 25,000 requests per day before Google begins 
charging proportionately to the request rate. 

net.Tagger will initially be deployed with the understanding that it will not scale in its cur¬ 
rent state. This thesis is designed to produce a proof-of-concept with limited release as 
part of a long-term, multiple researcher project. Aiming for a fully fleshed-out first release 
does not provide for feedback or course adjustments until a prohibitive amount of time and 
resources have been expended. Because net.Tagger is unlikely to see widespread adoption 
until released on several different smartphone platforms and bundled with user incentive de¬ 
vices, the current server backend will likely be sufficient for the near future. Any scalability 
issues that arise will be indicative of larger user adoption than anticipated, which would be 
a sign of success. They will be resolved as they present themselves through further stu¬ 
dent research projects and eventually seeking sponsorship funding after demonstrating the 
utility of crowdsourced network mapping. 
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CHAPTER 4: 
Testing and Results 


This chapter presents results from net.Tagger’s initial release. We give overall metrics 
for the current dataset, analysing tagging trends by type, provider, location, and inter¬ 
event delay. Specific examples of high-quality tags are discussed, including ones that 
utilize net.Tagger’s unique capacity to capture low-permanence infrastructure indicators. 
We demonstrate tag validation through Google products and manual image inspection, cat¬ 
egorizing submissions by accuracy for future research. Finally, we discuss examples of 
erroneous net.Tagger user submissions, including methods for identifying errors and ex¬ 
tracting useful information from incorrect tags. 

Since the proposal stage of this thesis, its primary focus has been providing a working 
proof-of-concept app/server framework. Because crowdsourced network mapping is a 
largely untested concept in the larger research community, much of the net.Tagger project 
thus far has been aimed at identifying target data and refining the collection process. Sec¬ 
tion 2.5 discusses the results of the former, and Chapter 3 describes the latter. However, 
even though this project’s primary goal is not data collection, a discussion of its preliminary 
results is still relevant to demonstrate the utility of the net.Tagger implementation and show 
what analysis will be possible after its future widespread release. Another valuable set of 
results comes from our initial user community’s experience. Feedback on the user’s ex¬ 
periences provides metrics about net.Tagger’s usability and whether portions of its design 
enhance or detract from gathering useful data. 


4.1 Initial Release 

While net.Tagger’s eventual goal is to infer physical network topology, this requires a fairly 
complete tag set of a given geographical area. Time and resources did not permit an app 
release on a large enough scale to accomplish actual topology mapping. Without complete 
coverage of an area, it is difficult to state whether a series of tags demonstrates a unique 
underlying network feature or if further mapping of the surroundings would show a uniform 
distribution of more tags without useful trends. At the time of this writing, net.Tagger 
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is still in its beta testing phase, and the main intent of this limited release is identifying 
and correcting performance and stability issues that did not present during development. 
Releasing only to family members, Naval Postgraduate School (NPS) students, faculty, 
professional colleagues, and friends with a clear description of the project’s current status 
increases the likelihood of helpful user feedback. Skipping this step and pushing the app to 
as large an audience as possible without a smaller initial release would likely end in many 
of the target users discovering net.Tagger, experimenting briefly, and then uninstalling the 
app out of frustration over its unpolished appearance and function. 

Overall statistics for the project at this time are as follows: 

Table 4.1: High-Level net.Tagger Statistics 


Copies Distributed 

25 

Profiles Created 

12 

Contributing Users 

9 

Total Tags 

166 

Tags w/ Image 

101 

Total Providers 

18 

US States Represented 

5 

Countries Represented 

2 


The two most common reasons we received from 13 users who declined to participate were 
“insufficient personal time to participate” and “no lOS version of app.” The following 
figures display trends of the 9 contributing users. Figure 4.1 parallels similar projects 
analyzed in Section 2.4. 
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Figure 4.1: CDF of Tags by User 


Even with a small sample size, a trend is elearly visible where a large number of users 
aeeounted for a small portion of the total tags. Conversely, a small number of users con¬ 
tributed the majority of the tags. Out of 166 tags, the top three users submitted 133 tags, 
with 101 the highest number. Presumably, when net.Tagger scales up in size, this trend will 
continue. Assuming rough equivalence with OSM use rates, we can anticipate most tags 
coming from a core 5-10% section of users, with the rest of the user base submitting at 
lower rates. 

In Figure 4.2, we examine the number of distinct types (manhole, duct, etc) of infrastructure 
tagged per user. 
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Figure 4.2: CDF of Infrastructure Types by User 


The maximum number of infrastructure types was 5, which 11.1% of our users reached. 
In examining this metric, we seek to determine whether some users tag only one type of 
infrastructure (perhaps because of where they live, or what they commonly notice), or are 
adept at tagging many or all of the types of infrastructure in which we are interested. We 
observe a generally uniform distribution of infrastructure types, suggesting that our user 
base does not exhibit any particular bias in the tag types. In Figure 4.3, we examine the 
number of different infrastructure providers in each user’s set of tags. 
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Figure 4.3: CDF of Infrastructure Providers by User 


Users were able to ehoose from six major providers, “unknown,” or an “other” option where 
the user notes the name of the provider in their eomments. The six speeifie providers were 
seleeted based on informal analysis of the most eommon providers encountered during 
initial fact finding research, with the intent of expanding and tailoring the app’s options 
in future releases. Of the eight available options, the maximum number of providers was 
five, achieved by 33.3% of users. Every user who submitted more than 10 tags fell into 
this category. This result implies that users who contribute beyond a certain minimum 
threshold will encounter a diverse set of providers, even if they limit themselves to one 
geographic location. Fully 88.9% of users submitted at least one “other” tag, specifying an 
additional provider. A further 66.6% of users submitted at least one “unknown” provider 
tag. In Figure 4.4, we examine each user in terms of how many zip codes they submitted 
tags from. 
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Figure 4.4: CDF of Zipcodes by User 


Although zip codes are defined and modified due to multiple metries in addition to geo¬ 
graphical zoning [48], they correspond to location and population distribution, providing 
a useful approximation of a user’s tagging loeations. Google’s Geoeoding API [49] pro¬ 
vides a reverse geoeoding lookup feature that we utilized for this analysis. The serviee 
requires erafting of simple HTTP requests with tag Lat/Longs as URL parameters to return 
Javaseript Objeet Notation (ISON) data including a zip code with suffix, whieh we auto¬ 
mated to simplify analysis. The maximum number of zip eodes for an individual user was 
four, which 11.1% of users achieved. The same number of users only submitted from one 
zip eode, with all others visiting two or three. This indicates that even users with a small 
number of tags will still exhibit some level of geographieal diversity, while still remaining 
relatively local. 

Overall, infrastructure providers, types, and zipeodes all showed fairly uniform distribu¬ 
tions. This might suggest that the variety of providers and tag types seales up as users 
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expand their geographieal area of eoverage. However, the sample size is too small to be 
eonclusive at this point. 

Figure 4.5 shows per-user delays between sequential tagging events. 



Figure 4.5: CDF of Tagging Delay 


It suggests that most users submit tags in relatively rapid succession of several minutes 
between tags and then are inactive for several hours or days. This demonstrates one method 
of use envisioned in Section 3.1 of users allotting dedicated periods of time to tagging 
instead of making periodic submissions over a larger period of time. Most users at this 
time are gathering evidence by direct request of the net.Tagger team, which likely takes 
the form of dedicated tagging trips. Another possible explanation for this trend is that 
upon seeing a possible submission, users become aware of other possible tags in the area, 
temporarily increasing their vigilance. If future research confirms this hypothesis, some 
type of user notification when entering high-density areas might provide a similar effect. 
This idea is more thoroughly explored in Chapter 5. User submission periods might become 
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more regular as the net.Tagger eommunity grows and eommunity ineentives are introdueed. 
Further researeh is neeessary to determine if these inferenees of behaviour are eorreet or if 
current conditions of data collection are artificially introducing them. 


4.2 Quality Examples 

Some of our 166 tags serve as examples of ideal net.Tagger submissions by combining mul¬ 
tiple indicators. They provide extra context of their surrounding areas even if the location 
has not been exhaustively covered by net.Tagger users, permitting preliminary inferences 
about underlying network topology. 

The following submission images are presented with their verbatim database extract, repre¬ 
senting the sum total of information available to us about a specific tag. Fields containing 
Personally Identifying Information (PII) are censored in this section for privacy reasons. 
Entries observe the following format: 

Table 4.2: Database Entry Format 


Tag ID 

TXID 

User ID 

Eat 

Long 

Timestamp 

Provider 

Type 

Comments 


Figure 4.6 combines three features in one: a duct marking, a “telephone” manhole cover, 
and an orange “COMM VAULT” marking. 


(XX,A3746D62E7381E3D4141B903CEBFC5C0FB39DC20,XXX@XXX,XX5892028,- 
XXX5903990, "2016-02-08 16:54:36-05 ",Unknown,Manhole, "Possibly 

AT&T") 
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Figure 4.6: Communications Vault with Duct 


Even though networking equipment is not specifieally referenced, FOCs carrying network 
traffic are frequently co-located with phone lines due to the high expense of laying new 
ducts. The markings and manhole access indicate some sort of central node, and the duct 
marking gives context about how it connects to other nodes. 

Figure 4.7 demonstrates a desirable net.Tagger datapoint by combining FOC ducts with a 
building of some sort. 


(91,C3ADC0DA3F8E36E09E67BC636AED99DD5F654505,XXX@XXXX,XX5893242,- 
XXX5906732,"2016-02-08 16:56:20-05",Unknown,"Orange Marking (misc.)","Possibly 

AT&T") 
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Figure 4.7: Duct with Building 


The user did not tag the building separately and likely did not identify the potential utility 
of doing so, however their tag image shows the eonneetion. It is possible that the duet sim¬ 
ply passes under the building and the two have no relation, but their assoeiation inereases 
the likelihood that the building houses some type of networking equipment. net.Tagger 
researehers would flag this as a loeation of interest and monitor the area for other tags indi¬ 
eating additional FOCs or aeeess points, looking for elues that the strueture is a loeal nexus 
of networking infrastrueture. 


4.3 Low-Permanence Indicators 

A unique eapability of net.Tagger is its ability to eapture infrastrueture indieators with 
low persistenee. While other mapping projeets deseribed in Chapter 2 target large, statie 
networking features sueh as railroad ROWs, net.Tagger ean eapture infrastrueture with rel¬ 
atively short-lived indieators when users are in the area tagging. Sueh “low-permanenee” 
indieators primarily eoneern FOC eables and duets, whieh are valuable mapping data for 
eonneeting network nodes. Beeause they are marked with ehalk or street paint, FOC mark¬ 
ings exist for short amounts of time, but are mueh more likely to indieate eurrent informa- 
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tion than other indicators. Figure 4.8 and Figure 4.9 show examples of this phenomena. 



Figure 4.8: Orange Marking and TV Pedestal, Bark and Grass 



Figure 4.9: Duct Marking, Grass 


Because these examples are placed over grass and bark dust, they possess a low persistence 
and will soon disappear from sight. As net.Tagger’s community increases in size, its ability 
to capture temporary indicators will correspondingly grow as well. 
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4.4 Tag Verification 

Because the initial net.Tagger release only featured 12 users (9 actually contributing) spread 
across 5 states, there were no cases of two users tagging the same finding. However, a 
number of submissions were at least partially verifiable by searching the tag Lat/Long on 
Google Earth and trying to match results against the user submitted tag image. This ap¬ 
proach is potentially time consuming, as it requires manual human validation for each tag 
and is not always successful if the target is out of sight from the Google Earth/Street View 
reference point. Because of these complications, manual verification would only be em¬ 
ployed on a case-by-case basis by net.Tagger researchers who identified certain tags as 
highly relevant for area-specific inferences. Despite its shortcomings, we successfully em¬ 
ployed manual verification for both urban and rural locations to prove its utility. An urban 
example of this process involves a tag in downtown Cambridge, MA. Eigure 4.10 shows 
the image submitted by the user, which, as a manhole stamped with “Communication,” 
appears to meet all criteria of a good tag. 



Figure 4.10: User-submitted Image 


If net.Tagger researchers believed verification of this tag was necessary before relying on it 
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for further inferences, it can be investigated via Google Earth’s StreetView feature. Figure 
4.11 shows the overhead view of the tag’s coordinates on the left (Marker #113) and the 
StreetView on the right. 



Figure 4.11: Google Earth at Image Coordinates 


Even at a lower resolution, several manholes are clearly visible that appear to match the 
user tag image in 4.10. While not as conclusive as a matching tag from another user, at 
least partial confirmation of the tag has been made. 

This approach can even work in rural areas. One user submitted two tags within minutes 
of each other in the middle of a forest on the Monterey Peninsula. The user indicated a 
cell tower (Figure 4.12) and Pacific Bell handhole near each other in an area away from all 
other structures except a construction site. 
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Figure 4.12: Cell Tower, User Submitted 


Because cell towers often connect to adjacent FOC lines, the combination of a tower and 
handhole in a more remote area is an important finding. When interviewed, the user con¬ 
firmed this finding, stating that he discovered the tags while trail running. Even if the user 
had not been available for comment, Google Earth can still provide initial confirmation. 

Eigure 4.13 shows the Google Earth coordinates of the cell tower tag (Marker #103) and 
Pacific Bell handhole (Marker #104). 
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Figure 4.13: Cell Tower, Google Earth 


Although low resolution, Google Earth elearly shows the eell tower’s profile rising out of 
the forest in the exaet location that the user’s image and tag places it. It is not possible to 
make out the handhole, but verifying one submission increases the chance that another tag 
from the same user several minutes later is valid as well. 

An additional verification process that focuses on the tag’s specific traits instead of its 
location involves checking the user’s description of the item against the user-submitted 
image. This is only possible if the user chooses to submit an image with their tag, which 
will eventually be incentivized as discussed in Chapter 5. Infrastructure provider, type, and 
comments can all be vetted against a submission image by a net.Tagger researcher in a brief 
amount of time and the tag reliability ranked accordingly. For this thesis, we ranked tags 
against their images according to the following categories: 


• All data fields concurred with image. In Figure 4.14, the infrastructure type and 
provider are clearly visible and concur with the user’s form submission. 
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(’XX’, ’5EDDE570778C03D96FD378CBF012853BDAEA3309’, ’XX@XXXX’, 
’XX5545949’, ’-XXX6765036’, ’2016-02-14 10:47:03-05’, ’Bell’, ’Manhole’, 

’null’) 



Figure 4.14: Bell Manhole 


Although the user eould have elarified “Bell System” in his eomments, the tag entry 
is still eomplete and eontains no misleading or ineorreet information. Submissions 
in this eategory are eonfirmed by their images. In our initial dataset, 77 of 101 image 
submissions fell into this eategory. 


• Some data fields are ineorreet, however the image eontains enough information that 
any errors are immediately apparent. Figure 4.15 shows a submission deseribed by 
the user as a handhole operated by an unknown provider. 


(’XXX’, ’EAF724412CD9EC5D3456D5924CF42AB5366D32E7’, ’XXX@XXX’, 
’XX5757070’, ’-XXX9336365’, ’2016-03-02 16:53:21-05’, ’Unknown’, 

’Handhole’, ’null’) 
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Figure 4.15: Mislabeled Manhole 


A cursory inspection of the image shows a manhole instead of a handhole, whieh 
the user has misidentified. However, the diserepancy is immediately apparent, and 
the tag ean be quiekly updated by net.Tagger researehers with no loss of information 
due to the error. The image even eontains enough resolution to zoom in and read the 
inseription “Bell System,” meaning that researchers ean even fill in the user’s blank 
provider field. Submissions in this eategory are eonfirmed, eorreeted, and potentially 
improved by their images. In our initial dataset, 11 of 101 image submissions fell 
into this eategory. 

• No diserepancies between data fields and the image are visible, however the submis¬ 
sion form data eontains information not verifiable by the image. Tags in this eategory 
are more complicated to eategorize. The diffieulty arises from the faet that net.Tagger 
researehers do not know whether the extra information in the form is due to factors 
not visible in the image, or represents a user error. Figure 4.16 shows a submission 
where the user identified an orange marking and speeified “Comcast” as the provider 
in the tag comments seetion. 
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(XX, ’75295219C09E3A5884AB85F5A3121E11D86A9607’, ’XXX@XXXX’, 
’XX7180402’, ’-XXX6330881’, ’2016-02-29 17:24:18-05’, ’Other (note in 
eomments)’, ’Orange Marking (mise.)’, ’Comeast eable’) 



Figure 4.16: Indeterminate Orange Marking 


The image elearly shows a duet marking, indieating that the user partially identified 
the eorreet infrastrueture type. However, the user’s rationale for submitting Comeast 
is not readily apparent. Normally, provider information for an orange marking would 
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be painted on the ground or not marked at all. Because the image does not include the 
provider name in the marking, the submission raises the question of whether the user 
knows something not included in the image, or if the user is mistaken. net.Tagger 
researchers possess a large enough sample set to determine fairly accurately from a 
well-taken image what information is or isn’t available, and this image seems to lack 
the information a user would need to accurately specify a provider. After reaching out 
to the user, we determined that the marking led to a residence serviced by Comcast, 
thus the submission was accurate. If this additional validation step was not available, 
the apparent discrepancy between form data and image would have forced net.Tagger 
researchers to partially downgrade the submission, keeping the infrastructure type 
but classifying the provider as “unknown.” Submissions in this category might be 
partially invalidated by their images, but still contain some useful information on a 
case-by-case basis. In our initial dataset, 6 of 101 image submissions fell into this 
category. 


• The image contains enough information to determine that the submission does not 
represent a valid net.Tagger data point. A detailed treatment of this category is 
given in Section 4.6. User submitted images provide the most reliable means to vet 
net.Tagger data through this process. In our initial dataset, 7 of 101 image submis¬ 
sions fell into this category. It is important to note that these erroneous submissions 
are not necessarily due to user incompetence or a misunderstanding of net.Tagger 
principles. Users are subject to their own time constraints while participating and are 
not expected to be subject matter experts. Many of our test users expressed concern 
about potentially submitting erroneous data, and we assured them that providing im¬ 
ages along with their tag data would give the net.Tagger team the means to vet their 
finds. The limited scope of this project allows us to tightly control more variables 
than a full-scale release; a feature we took advantage of by instructing our users that 
when in doubt about a finding, they should submit anyway. This helps fulfil one 
of this project’s objectives by revealing the ability of an average user to correctly 
identify Internet infrastructure. 
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4.5 Tag Comments 

Much of the need for projects such as net.Tagger comes from the large variety of compet¬ 
ing and overlapping telecommunications providers who communally own and operate the 
Internet backbone’s infrastructure. As different corporations change ownership, merge, ac¬ 
quire new assets, and lease infrastructure to others, the infrastructure indicators targeted by 
net.Tagger have the potential to become increasingly obfuscated. The “comments” section 
of a net.Tagger app submission is of critical importance to augmenting a tag and mitigating 
data gathering challenges. Even minor or incomplete tag comments can give net.Tagger 
researchers insights into the validity and relevance of a given tag for making further infer¬ 
ences. The more tags a user submits, the more likely he or she will begin to build a picture 
of what infrastructure indicator trends exist in their local area, and which of their findings 
are unique or relevant in a broader context. Ideally, as the net.Tagger user base grows and 
matures, tag comments will grow in importance and usefulness. Even in net.Tagger’s cur¬ 
rent phase, tag comments are an important tool to fill in information gaps not covered by the 
app’s dropdown options in the data submission screen. Putting too many options in a menu 
clutters the UI, removing a user from the submission cycle. Once the net.Tagger dataset is 
large enough, the app can be modified to offer a location aware selection that offers a user 
the most prevalent providers in the area to choose from. This can be further combined with 
on-device caching of the users’ past submissions to simplify the submission process for 
users on an individual basis. Tag comments can not only clarify submissions, but provide 
additional data sources for net.Tagger researchers to mine for possible app improvements. 

As an example of tag comment utility, different telecommunications providers such as 
AT&T and CenturyEink own or operate part of the historic Bell System, often as inde¬ 
pendent entities. Eisting all these possibilities in the app’s data submission screen would 
likely lead to user frustration. However, our initial findings showed that most users will 
still clarify which specific Bell iteration they have discovered in their comments. Out of 
the 35 tags users labelled as “Bell,” we received comments clarifying “Pacific Bell,” “Bell 
Telephone,” “Pacific Telephone,” and “Bell System.” This amplifying information is useful 
for determining local provider trends and isolating specific infrastructure features. 

Unfortunately, many tags in the initial net.Tagger dataset either lack comments or do not 
contain enough information to be useful. Of 166 tags, 51 did not include any, representing 
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approximately 20% of all submissions. 


4.6 Errors and Noise 

Preliminary interpretation of the 166 tags at the time of this thesis shows a number of 
eomplieations. Beeause of the close ties of active users to the net.Tagger team, reaching 
out for clarification was much more straightforward than with a general public release. This 
offered a temporary advantage in determining if a submission was truly erroneous or only 
appeared so based on the data available. For example, the following tag (Figure 4.17) was 
submitted from downtown Monterey: 


(136,AEEA6C4CA37F9BB9CC9CF78901C39EE37AF80D04,XXXX@XXXX,XX5984008,- 
XXX8957686,"2016-03-06 16:02:15-05",Unknown,"Duct 
Marking","2-4""ducts") 



Figure 4.17: Duct Marking Tag 


The user’s data entry indicates a sidewalk duct marking, annotating the marking’s text 
in the comments section. However, viewing the image submitted with the tag shows a 
duct marking that appears to be drawn in red paint, which by APWA standards would 
indicate electrical power instead of telecommunications equipment. Under the information 
available between the tag entry and attached image, net.Tagger researchers would likely 
conclude that the user mistakenly submitted a power cable duct as as telecommunications 
asset, requiring reclassification of the tag as inaccurate. However, after discussing the tag 
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with its responsible user, we eoncluded that he eould properly identify PMS 144 Orange, 
and local lighting conditions caused his smartphone camera to misrepresent the color of the 
markings. 

Other submissions (Figure 4.18) were clearly erroneous, however verification was straight¬ 
forward because the users were careful to provide details in their comments. 


(86,524B8FABBE717B5AACCEFC4383BE6B82176B8865,XXX@XXXX,XX5796949,- 
XXX6177637,"2016-02-05 14:40:39-05'’,"0ther (note in 
comments)",Manhole,"PacfiCorps electrical vault") 



Figure 4.18: Electrical Vault Tag 


This submission came from a user lacking a networking background. Upon analysis, the 
image lacks positive indicators of networking equipment, and PacifiCorps is a utility com¬ 
pany that does not provide telecommunications services. When interviewed, the user stated 
that he was unsure about the find, but chose to submit with as many details as possible to 
facilitate eventual verification. Vetting the tag was simple for the net.Tagger team, and the 
same user submitted a number of high quality tags in the adjacent area. 
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Other erroneous tags (Figure 4.19) did not have additional eomments, but eould still be 
downgraded in reliability due to the image. 


(112,855D69370A5AE4B5E8375091D85A98781B3C43C4,XXX@XXXX, 
XX3627900,-XX0911454,"2016-03-03 16:21:44-05",Qwest,Manhole, null) 



Figure 4.19: Qwest Manhole Tag 


This submission was marked as a Qwest manhole with no amplifying comments. The man¬ 
hole bears the engraving “BECo,” which according to low validity sources [50] is the mark¬ 
ing for Brooklyn Edison Company, a power utility company based in New York City. Based 
on the conflicting tag data/image information, this data point does not possess enough reli¬ 
ability to be used for future inferences without more information. 
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CHAPTER 5: 
Future Work 


Some research projects complete their investigations and list “Future Work” ideas as an 
afterthought with minimal content. Because this thesis represents the first effort in creat¬ 
ing the larger net.Tagger initiative, this chapter takes on significant importance. While the 
net.Tagger project has a clearly defined goal - broad mapping of physical network infras¬ 
tructure through crowdsourcing - the specific implementation and requirements continue to 
be refined. Implementing an initial mobile app and server framework, performing data col¬ 
lection, and gathering user feedback allowed us to identify additional features and project 
enhancements that will greatly increase the quality and utility of research findings going 
forward. 

This chapter addresses four categories of future work for net.Tagger. A primary area of 
work will involve additions and enhancements to the smartphone app, including porting 
to other platforms, enhancing the UI, and increasing the map overlay to include the en¬ 
tire project dataset. Second to be upgraded is the backend server infrastructure. This in¬ 
cludes a full security audit, better web services handling, and integration with the OSM 
stack and dataset to perform native map renders. Third, data analysis and data fusion will 
greatly enhance the research value of the project dataset. Finally, and most importantly for 
net.Tagger’s expansion and future, is development of features and incentives to increase 
adoption and use. 


5.1 App 

5.1.1 User Interface 

While the UI has undergone considerable evolution over the course of this project, it is still 
a product of the short development timeframe. Due to the increasing quality of most smart¬ 
phone apps, potential users are likely to view the quality of new apps as a function of visual 
presentation, workflow intuitiveness, and overall ease of use. Even if UI features do not di¬ 
rectly increase the quality of collected data, they are still important to net.Tagger’s success 
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as a crowdsourcing project. An intuitive user experienee will provide fewer entry barriers 
to users, partieularly those laeking a teehnieal baekground who might be intimidated by a 
less usable set-up. The main display ean be improved through implementation of minor 
features sueh as using slide-out menus instead of statie buttons, whieh erowd the display 
when not being used. Multiple beta testers were eoneerned about their ability to submit 
aceurate data. Beeause these users were people already possessing teehnieal baekgrounds, 
their eoneems indieate that the entire speetrum of potential users would benefit from ad¬ 
ditional in-app resourees guiding the submission proeess. One feature aeeomplishing this 
is a tutorial style walkthrough offered to users upon a fresh install. Many apps provide 
a demonstration of this sort, sinee statie “Help” doeumentation does not always translate 
into praetieal understanding for all users. Some testers reported an initial hesitaney to begin 
tagging for fear that they would make a mistake and pollute the projeet database with false 
information. In addition to a walkthrough, another feature that would ease their eoneems 
is the ability for users to delete their own tags if they feel the submission was errant. Cur¬ 
rently, there is no delete meehanism in the app. However, multiple testers inadvertantly 
submitted tags when first exploring the app and voieed eoneern that they eould not elean 
up their mistakes. Giving users the power to delete tags allows them to experiment without 
fear of messing up, not only redueing researeh errors but also shortening the time between 
installation and feeling eomfortable about partieipating. For researeh purposes, deleting 
a tag in the app should not aetually delete the information from the net.Tagger database. 
Knowing how frequently users delete data relative to time spent using the app is a useful 
metrie for researehers. If multiple users submit and delete tags in a speeifie loeation, this 
eould indieate that an infrastrueture indieator exists but is ambiguous and needs further val¬ 
idation before using it for network inferenees. Instead of aetually deleting the submission 
from the net.Tagger database, the in-app delete option should flag the appropriate database 
entry, remove the marker from the user’s map, and display a user message that the tag is 
deleted. This provides the user with assuranee that the net.Tagger team knows of the error 
while still preserving the data for other purposes. Combining a tutorial walkthrough with 
tag deletion eapability will ensure that users feel more eonfident about partieipating while 
inereasing the likelihood of eorreet submissions. 

In addition to app UI improvements that will help users get started with net.Tagger, other 
planned features will assist users while gathering data. One feature suggested by test users 
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is automated notifications once the user enters a new area with few or no tags. Users 
will be able to enable this feature from a settings dialog and configure it to define what 
a “new area” consists of. For example, a user could set their net.Tagger instance to alert 
them if they are more than a certain distance from any of their past submissions. Another 
alert might trigger if the user is near an unverified tag from another user, indicating nearby 
targets of opportunity. Not all users will desire a notification feature, and it is of little utility 
for users during dedicated tagging sessions who have the app open where they can actively 
see the map. However, other users might be interested in submitting intermittent tags while 
they are performing other tasks, and would appreciate notifications informing them that 
they are in a potential tagging location. The notification feature can be further integrated 
with the upgraded map display to display helpful messages to users when it triggers. 

5.1.2 App Backend 

In addition to the UI improvements of Section 5.1.1, some improvements to the app’s back¬ 
end are necessary before undertaking broader distribution efforts. Of primary importance is 
improving the app’s location sensor routines, which define the precision and regularity with 
which the app samples the user’s GPS coordinates. Currently, the app uses manually coded 
location routines that use fine-grained Android functions instead of more granular API 
methods. These provide the high accuracy necessary for accurate tag measurements, but 
place an unreasonably high load on the smartphone’s battery life. Android developer guid¬ 
ance recommends using native location tools available as part of Google Play service APIs, 
as they automate these processes to optimize battery life without compromising location 
accuracy. Unfortunately, net.Tagger cannot make use of them until the app is registered 
with the Google Play Store. Once they become available to the project, refactoring the 
app’s code to use them will provide better battery usage, reducing the potential for users 
to become frustrated with the app. Market research surveys of app users [51] identifies 
battery issues as a motivating factor in users giving negative reviews or uninstalling apps, 
particularly with mapping applications. This gives net.Tagger incentive to use all available 
resources to manage app resources well. Other smartphone sensors discussed in Section 

2.6.2 can be leveraged to improve research data without requiring active user action. The 
Android orientation sensor can be used to directly calculate the orientation of a device 
relative to magnetic north, however it requires substantial processing power and has been 


71 



deprecated since Android 2.2 [52]. Android provides methods that calculate equivalent re¬ 
sults without utilizing raw orientation sensor data. Another capability that can be leveraged 
futher is the GPS sensor. Currently, the app only transmits a lat/long and blocks users from 
submitting if the GPS sensor’s calculated accuracy is less than 30.0 meters. Instead of set¬ 
ting an accuracy limit, the app will transmit the sensor’s accuracy at time of submission. 
The combination of lat/long, position sensor accuracy, and device orientation for each tag 
will provide a much more accurate tag than lat/long alone. 

Another useful capability to implement would be the ability to store user submissions on 
the smartphone if network services are not available, permitting users in remote locations 
to tag findings for upload once service is restored. This feature would require careful 
implementation and configurability from the user’s settings menu. Mismanagement could 
place a burden on device storage and mobile data, particularly if the user accumulates a 
large number of findings before reentering a service area. These issues could be addressed 
by allowing users to place storage limits on the app and limit burst transmissions to when 
the phone is connected to wifi networks. This would function similarly to smartphones that 
avoid downloading app updates until connected to wifi, preventing excessive mobile data 
consumption. 

To facilitate future software development, the net.Tagger app should continually improve 
its error handling and crash reporting. Currently, the app utilizes the Application Crash 
Reports for Android (ACRA) library, which automatically sends stack traces and phone 
version information to a net.Tagger server upon full crashes. This proved very useful during 
the initial app release, when almost half of net.Tagger users experienced unrecoverable 
crashes during installation. ACRA crash reports quickly narrowed the problem to Android 
Version 6 smartphones, which utilize a radically different permissions model than versions 
used during development testing. Once identified, the issue was quickly patched and a new 
version pushed out. While these reports are invaluable, they only provide information when 
the app experiences a complete crash, which should occur less frequently as the production 
code evolves to better anticipate error conditions. These improvements come at the cost 
of less information to troubleshoot issues. Even if the app handles errors without full- 
on crashing, different features may still not be functioning as intended. While coding and 
testing, net.Tagger developers can make use of debugging features such as Android Studio’s 
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LogCat to view helpful messages about the app’s state. Before large-scale distribution, 
net.Tagger should implement improved logging systems to send relevant information about 
experimental or high probability of failure processes to net.Tagger servers. Unlike now, a 
full-scale release does not offer the ability to reach out and contact users about their issues 
as readily, and automated processes must be put in place to collect relevant information. 

5.1.3 Distribution 

A successful crowdsourcing project relies on effective advertising and providing a simple 
way for potential users to obtain and install the app. Currently, the net.Tagger app exists 
as an .apk file download on a Center for Measurement and Analysis of Network Data 
(CMAND) website. This approach requires users to visit the website, manually download 
the .apk file, disable their smartphone’s security protections against third party unverified 
apps, and finally install the app. While sufficient for initial beta testers already associated 
with CMAND, this implementation is not suitable for wider distribution. The next logical 
step is signing, registering, and importing the app into the Google Play Store. In addition 
to increasing net.Tagger’s profile to its potential user community, most smartphone owners 
will not trust anything outside of official distribution channels, and release through the 
Play Store removes many security concerns users might have with a third party app. Also 
important to the project’s success is the ability to push out updated versions of the app to 
users as the improvements described in this chapter are implemented. Hosting the app as 
a file download on the net.Tagger website requires users to download fresh copies every 
time a release is made. The effort this entails reduces the likelihood users will perform the 
extra step, hindering the project’s ability to grow and expand. Integration with the Play 
Store gives project developers the means to release updates with a far greater certainty that 
users will receive and automatically install them. The Play Store also provides users with 
the means to assign numerical ratings and reviews of apps, which gives net.Tagger another 
source of feedback. While a useful asset. Play Store feedback also increases the importance 
of identifying and removing as many bugs as possible before release, as bad initial reviews 
could discourage potential users from installing. To this end, net.Tagger should ensure 
compliance with Android’s published series of quality control guidelines [53] before app 
release. Once better mechanisms of distribution are in place, net.Tagger can take advantage 
of additional resources to more broadly advertise the project. Resources such as the North 
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American Network Operators Group (NANOG) Mailing List or OSM forums can be used 
to both increase project visibility and solicit feedback. 

5.1.4 Platform Porting 

Currently, net.Tagger only exists for Android devices. The Android development commu¬ 
nity provided many useful features and resources that were a key factor in producing a 
usable prototype within the time constraints of this project. However, limiting net.Tagger 
to Android would neglect the sizable market share of potential users who use other smart¬ 
phone platforms such as Apple’s lOS. In late 2015, lOS represented approximately 28% 
of the US market share, second to Android’s 67% but well ahead of Windows’ third place 
3.5% [54]. Technologically, it is not possible to port or cross-compile net.Tagger’s java- 
based Android code directly to lOS’s Objective-C. However, the UI design, workflow, and 
server infrastructure can be reused, amortizing the cost of design and testing of these com¬ 
ponents. Instead of writing the lOS app from scratch, it can be built to an existing specifi¬ 
cation and template, thereby presenting fewer challenges to an experienced programmer. 

5.1.5 Map Display 

Currently, the net.Tagger app main screen displays the individual user’s submission history 
in the form of markers placed on a Google Map overlay. The app accomplishes this by 
keeping a local data file holding their past tags in the app’s private directory. Every time the 
user submits a tag, the file is updated and the map reloaded to enter the marker. Although 
the data file can store many different types of data, the only information currently stored 
is a tag id and lat/long for each submission. The main advantage of this approach is that 
it requires no management of a distributed dataset. Each user’s smartphone maintains a 
local copy of its history while sending more detailed submission reports to the central 
server. A more ideal app configuration would display markers representing the majority 
or all of the net.Tagger dataset to indicate areas that have already been searched. Users 
should be able to set a variety of display filters on their map, including displaying all tags 
by all users, all tags by the smartphone’s owner, all unverified tags, and tags by indicator 
type. This will allow users to scale back their display if app performance and mobile data 
are an issue, as well as assisting users conducting searches to target specific leaderboard 
categories. This would permit users to investigate existing findings to perform verification 
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tags or avoid them in order to seareh for original findings. Ineluding this feature will 
require additional network funetionality and eareful eonsideration to avoid burdening users’ 
smartphones. Other applieations sueh as Google Maps also allow users to tag features sueh 
as gas stations and restaurants. However, implementing this in net.Tagger will require extra 
eaution due to substantial amount of data that must be pushed to users in areas with a high 
infrastrueture indieator density. With eareful planning and seheduling of data pushes to 
users, net.Tagger will be able to provide a dynamie, informative display to its users without 
ineurring performanee or data eonsumption issues. 


5.2 Server 

5.2.1 Security Considerations 

net.Tagger was intentionally designed to limit the amount of sensitive data it transmits 
and stores. User submissions ineluding profile data, tag data, and images, are sent via 
https POST messages utilizing Android’s built-in seeurity eertifieates. This delegates the 
seeurity of sensitive data in transit to existing seeurity implementations, providing a higher 
level of seeurity than ereating eustom net.Tagger transmission protoeols. A more likely risk 
eomes from a breaeh of data residing on the net.Tagger server. Instead of the eonvenienee 
of built-in methods for the app, the net.Tagger server must host and seeure multiple web and 
database serviees while ensuring their availability for all required proeesses. The simplest 
means of seeuring data at rest on the net.Tagger server is to refrain from storing data that 
requires seeuring. A user profile only eontains a niekname, email address, eountry, and 
password. The only information intended to be uniquely identifying is the email address, 
whieh is used to distinguish users for researeh purposes, and the niekname, whieh will be 
publiely available on the leaderboard onee implemented. This reduees both the potential 
eonsequenees of a data breaeh as well as the likelihood of attaekers viewing net.Tagger as 
a worthwhile target. However, this does not eliminate the need for the net.Tagger researeh 
team to proteet PII entrusted to them by the user eommunity. Beeause of the tendeney for 
people to reuse passwords and email addresses when registering for web serviees, aeeess 
to the four eomponents of a net.Tagger profile eould give attaekers information useful for 
targeting users on websites unrelated to net.Tagger. 
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Limiting user data reduces security requirements of the project to following best practices 
and using built-in features of its native software packages. net.Tagger backend components 
such as Apache and PHP have established security practices dictated by their own [55] or 
third party foundations [56] providing guidance that is sufficient to secure most simple web 
applications using their products. Basic security precautions for net.Tagger are in place, 
such as storing user passwords in the profile database after hashing and salting with PHP’s 
native password handling features. However, because of this project’s short development 
time, a full security audit of the app and backend server is still pending. 

Any audit will have to take into consideration three possible attacker objectives: data theft, 
data corruption, and service interruption. Data thieves would target user profile or tag data. 
Both types of data include database entries, with tag data also including seperately stored 
image files. Tag images would be of little utility without the accompanying database entries 
to correlate them to users and locations, so any data theft attacks would involve some form 
of database attack. 

Data corruption attacks would attempt to either delete and corrupt data stored on the server 
or insert false data points. Instead of exfiltrating data, these adversaries actively seek to 
modify data on the server. While more disruptive, modification attacks are harder to execute 
against the net.Tagger server because most of them would require some form of superuser 
permission. The PHP scripts that interface between received tag data and the databases 
do not have modification or delete database privileges, which exist only for the postgres 
superuser. 

An attacker could attempt to craft fake tag submissions, which are simple HTTP POST 
messages carrying ISON data and could be easily replicated. However, the server scripts 
will not accept submissions without a valid session ID from an app instance, which can only 
be generated by submitting credentials that match profile entries on record in the database. 
Even though corruption attacks may be more difficult to launch, the security audit should 
still ensure that all Apache, PHP, and database instances are locked down to reduce their 
likelihood of occurring. 

Finally, service interruption attacks would attempt to deny net.Tagger server availablility 
through some form of Denial of Service (DoS) attack. These adversaries could perform 
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large numbers of web requests or make net.Tagger submissions that do not require sessions 
eredentials, such as submitting profile data to fill up the database. Although there are 
limited remediations against these attacks, an audit could ensure that the net.Tagger server 
has enough scalable resources available to adapt to any DoS attempts. 

5.2.2 OSM Integration 

net.Tagger is heavily inspired by the OSM project and will likely draw upon the OSM 
software stack and dataset for future work. Because of OSM’s open source philosophy 
and licensing, net.Tagger can employ these resources free of any reimbursement or com¬ 
pensation as long as any use is properly credited. An explicit goal of the project is the 
eventual integration of net.Tagger’s data into the OSM community. Further, because the 
OSM community represents a large population segment of users who have similar motiva¬ 
tions to the desired net.Tagger user community, e.g., individuals who voluntarily annotate 
maps, bidirectional interaction between net.Tagger and OSM is a potential means of fur¬ 
thering net.Tagger’s goals. Such integration could be accomplished by importing verified 
net.Tagger data into the OSM dataset. OSM emphasizes above-ground features that can 
be verified by other mappers as part of its implementation philosophy, with no real means 
to record virtualized inferences of below-ground networks [57]. However, the street-level 
infrastructure indicators from net.Tagger can be recorded in OSM much like other street 
level OSM features such as bike racks or utility poles. Importing part or all of the even¬ 
tual net.Tagger dataset into OSM is not without its potential disadvantages, and would only 
happen after a careful cost-benefit analysis. Any import could only take place after inter¬ 
acting with and gaining approval from the OSM Import Mailing List [58] to ensure that the 
bulk data met OSM standards and was appropriately categorized. 

5.2.3 Native Renders 

Currently, the only means to render tag data in a map overlay is through the app’s Google 
Maps API. The Google Maps API was chosen as an expedient way to meet the project’s 
time constraints. Although useful for prototyping, long-term reliance on a proprietary map¬ 
ping API conflicts with several of net.Tagger’s core objectives, net.Tagger aims to provide 
map renders on multiple platforms, including Android, lOS, and web browsers. Addition¬ 
ally, net.Tagger seeks to maintain as much compatibility with OSM as possible to permit 
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use of and possible future integration with the OSM dataset. Finally, most members of 
net.Tagger’s target user eommunity are assoeiated with open souree projeets and initiatives 
that emphasize information sharing and openness of data and methods. Considering these 
faetors, migrating map renders to an open souree, OSM eompatible approaeh is a logieal 
next step for both web and app displays. Fortunately, the OSM software staek meets all of 
these eriteria. Although there is no one standard OSM approaeh to rendering and serving 
map tiles, a standard eommunity approaeh uses an open souree rendering software known 
as Mapnik [59] [60] in eombination with helper paekages to pull data from a PostGIS 
database, overlay it onto an existing GIS dataset (sueh as the OSM planet file), and serve 
the resulting map tiles via an Apaehe web server. Various OSM sub-eommunities provide 
doeumentation of their setups to assist others in deploying map servers using free, open 
souree software. Various toolkits also exist to direetly integrate OSM data into apps. One 
example is OSMDroid [61], an open souree toolkit using OSM data as a direet replaee- 
ment for most GoogleMaps API features. This would permit a straightforward port of the 
net.Tagger app from GoogleMaps to OSM based displays without requiring extensive eode 
rewrites. The net.Tagger projeet ean ineorporate these resourees as part of its expanded 
web and app presenee. 


5.3 Data Analysis 

While mueh of this thesis eovers net.Tagger’s erowdsoureing implementation, the eore goal 
of the projeet remains analyzing and drawing useful physieal network topology inferenees. 
Before useful analysis ean take plaee, eolleeted data must be initially eategorized and vet¬ 
ted. A key part of this proeess is extraeting information from submission images and aug¬ 
menting the user’s form data inferenees. However, the antieipated volume of data implies 
that manual inspeetion by the small projeet team is not possible. Several possibilities exist 
to automate or outsouree this proeess. 

5.3.1 Image Recognition 

Although image reeognition teehnology has limitations, it still represents a potential means 
to identify net.Tagger’s targets. Many of the indieators in Seetion 2.5 have distinet shapes 
sueh as eireles (manhole eovers) and reetangles (handholes), or eolor (PMS 144 Orange). 
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Image recognition software could theoretically search for these predetermined shapes and 
colors in user images and check them against what the user identified as the find. Depending 
on the image quality and camera perspective, markings and text in images could potentially 
be analyzed with Optical Character Recognition (OCR) software as well, however, human- 
based verification will also play a large role. More complex shapes such as cell towers 
and buildings may not lend themselves to automated cataloguing, due to their lack of a 
generalized shape or intentional obfuscation, as discussed in Section 2.5.6. However, all 
other infrastructure indicators possess a specific shape that can be target with information 
recognition software. 

5.3.2 Mechanical Turk 

To extract more detailed information from images, net.Tagger could integrate with Ama¬ 
zon’s Mechanical Turk service [62]. Mechanical Turk is a crowdsourced Amazon Web Ser¬ 
vice (AWS) allowing individuals, researchers, or businesses to submit Human Intelligence 
Tasks (HITs), small chores that are difficult to complete via computer but easily accom¬ 
plished by a human being. Volunteers perform the tasks and receive a small compensation 
for each HIT, usually on the order of a few cents. Mechanical Turk lends itself well to im¬ 
age processing, particularly matching patterns or extracting text. These capabilities could 
be employed to verify images such as the previously mentioned cell towers and buildings. 
A sample HIT might involve presenting an image that a net.Tagger user categorized as 
a Levels Telecommunications building, then asking the Mechanical Turk user questions 
such as “Is this picture of a building? What company names are present?” Mechanical 
Turk could also be used to supplement automated image recognition. For example, orange 
street markings frequently contain descriptive labels written freehand in street paint that are 
far less legible than stamped manhole inscriptions. If image recognition software detects 
the PMS 144 color in a user submission, the image could be redirected to Mechanical Turk 
to ask if any phrases exist in the picture. 


5.4 User Incentives 

The success of any crowdsourcing project relies on a simple principle: the project must 
provide its users with reasons motivating them to join, contribute, and continue partic- 
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ipating long enough to provide useful data. Incentives can take many forms, including 
monetary, prestige, or conditional access to an asset. Depending on the resources available 
to a project, multiple forms of incentives can be combined to target a larger potential user 
base. 

5.4.1 Leaderboard 

The planned incentive net.Tagger will incorporate into its initial large-scale deployment 
involves recognizing users based on the quantity, quality, and type of their submissions. 
These rankings will be displayed in an online “leaderboard” displaying users according 
to their tagging accomplishments. A key advantage of such a system is that net.Tagger 
administrators can assign points (or negative points) to different types of actions that factor 
into a user’s ranking score. Possible point strategies for different categories of submission 
include: 

• Submitting an original tag with an accompanying image and user comments. This 
would be worth the maximum number of points, as it provides not only the stan¬ 
dard submission data, but a means of verification. For example, if a user selects one 
infrastructure type from the app UI, but enters comments about a different type, re¬ 
searchers can assign a lower probability that the submission is accurate. An image 
provides even better verification ability, where researchers can clearly see if a user 
inferred correct information about a submission. 

• Submitting an original tag without an image or comments. In order to account for 
users with constraints on their time or phone data plans, net.Tagger provides the abil¬ 
ity to submit tags containing only app form data and GPS sensor information. These 
submissions are still useful, particularly if verified through multiple users tagging 
the same find. However, they provide less data than a full submission, and would be 
worth fewer points. 

• A bonus for submitting an especially valuable tag. A unique feature of net.Tagger 
is its ability to gather data about infrastructure indicators that only exist temporarily, 
primarily orange street markings that eventually fade and wash away (section 2.5.1). 
These markings provide some of the best data, including the streetwise orientation 
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of the infrastructure. Because the markings exist for a much shorter time than more 
permanent infrastructure such as manhole covers, any indicated provider name is 
more likely to be current and accurate. The leaderboard algorithm can provide a point 
bonus for submission and verification of temporary markings, encouraging users to 
seek them out before they disappear. 

• Verifying another user’s submission. To increase the validity of research data, users 
can be prompted to seek out and verify other submissions. This feature could not be 
implemented until the enhanced map display (5.1.5) is implemented. A verification 
feature could be presented to users as a means for newer users to gain early points. 


The verification feature introduces new error handling abilities, but must be handled care¬ 
fully to avoid unintended consequences. Allowing users to essentially “challenge” submis¬ 
sions made by others if they cannot replicate the same results might provide an incentive 
to submit false tags to earn points for themselves while subtracting points from the original 
tagger. Unethical users trying to attain and stay at the top of the leaderboard could easily 
take advantage of verifications. Even discounting the potential effects of user misconduct, 
other situations might produce negative results as well. Because of their non-permanency, 
orange street markings disappear after a relatively short amount of time, and a user attempt¬ 
ing to verify them weeks or months after the original tag could find nothing and submit a 
challenge even though the initial tag was correct. The variable accuracy of smartphone 
GPS units means that a tagged item does not exist where the tag lat/long indicates, but 
somewhere in a circle with a radius equal to the GPS error. In dense urban areas with high 
concentrations of infrastructure indicators, a verifying user might go to a tagged location, 
mistake one infrastructure indicator for another, and erroneously verify or challenge the 
wrong indicator. The verification process will require careful planning to avoid exploita¬ 
tion or inadvertantly introducing additional errors into the net.Tagger dataset. 

In addition to a web-based leaderboard, the app will eventually have a local leaderboard of 
its own. The online leaderboard has the advantage of immediate access to the net.Tagger 
database, making calculation and display of the entire user community straightforward. 
Pushing out these results to the distributed network of user smartphones, however, is less 
simple. To compromise, each smartphone’s leaderboard might display a smaller subset of 
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results. This can be automated simply, with each app instance requesting updated results 
from the net.Tagger server once per day and receiving the ranked top ten as well as the 
standing of the user associated with the specific instance. 

To further encourage competition, the user community can be permitted to form teams 
ranging from small groups of peers to entire countries. Displaying leaderboard rankings 
by country can be done with minimal extra effort because the information is included in 
each user profile. Allowing users to form additional groups would foster collaboration on 
a smaller scale. 

5.4.2 Micropayments 

Much like Amazon’s Mechanical Turk, users could be paid a small amount in money or 
some form of credit. This feature would not be feasible without project sponsorship, and 
would thus be reserved for more mature releases. Because users might be tempted to submit 
false data to gain monetary rewards, delaying this feature would also allow fine-tuning of 
the verification process to better identify and prevent user fraud. Providing monetary com¬ 
pensation for all users and all submissions could easily lead to fraud, with users submitting 
fake tags in order to artificially boost numbers. Users would likely be required to undergo 
additional registration or vetting before becoming eligible to receive compensation. They 
might be initially required to to submit a certain number of verified tags, and only begin re- 
cieving compensating after passing a predetermined threshold. Even though this increases 
the administrative burden on project administrators, only a small number of users would 
likely qualify for this feature. As OSM demonstrates [32], the majority of high quality 
submissions would likely come from only a few percent of project participants. In order 
to increase the difficulty of faking a tag, compensation would be limited to submissions 
including images. 


5.4.3 Dataset 

Like OSM, net.Tagger’s potential users exist on a spectrum, from casual users participating 
as a novelty to more dedicated, enthusiastic users with technical backgrounds employed 
in related areas of research or academia. Less invested users are unlikely to be inter¬ 
ested in the accumulated project data beyond viewing maps of their findings. However, 
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users working in similar research areas might desire access to portions of the net.Tagger 
dataset. Where micropayments would target high-performing individual users, access to 
part of net.Tagger’s dataset would be an incentive aimed at research groups or similar en¬ 
tities providing some benefit to net.Tagger through established relationships. Much like 
micropayments and exporting data to the OSM project, providing other researchers access 
to the net.Tagger dataset would not be implemented until the project matures, in contrast to 
leaderboard implementation, which is of immediate interest. 
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